The following describes how to configure a MindTouch VM to index Word 2007 (.xslx) documents.
cpan Spreadsheet::XLSX
We will create two scripts to convert an xlsx file to plain text. The first is a perl script (xlsx2txt.pl) which takes a filename on the command line and converts it to plain text. The second is a bash wrapper script that reads from STDIN and creates a temporary file then invokes xlsx2txt.pl.
#!/usr/bin/perl
use Text::Iconv;
my $converter = Text::Iconv -> new ("utf-8", "windows-1251");
# Text::Iconv is not really required.
# This can be any object with the convert method. Or nothing.
use Spreadsheet::XLSX;
my $file = $ARGV[0];
if(!$file) {
print("Usage: xlsx2txt filename.xlsx\n");
exit(1);
}
if(! -e $file) {
printf("File: %s not found\n", $file);
exit(1);
}
my $excel = Spreadsheet::XLSX -> new ($file, $converter);
foreach my $sheet (@{$excel -> {Worksheet}}) {
$sheet -> {MaxRow} ||= $sheet -> {MinRow};
foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) {
$sheet -> {MaxCol} ||= $sheet -> {MinCol};
foreach my $col ($sheet -> {MinCol} .. $sheet -> {MaxCol}) {
my $cell = $sheet -> {Cells} [$row] [$col];
if ($cell) {
printf("%s\n", $cell -> {Val});
}
}
}
}
#!/bin/sh # Setting PATH PATH=$PATH:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin export PATH TEMP=`mktemp` dd of=$TEMP > /dev/null 2>&1 xlsx2txt.pl $TEMP 2> /dev/null rm $TEM
Copy the xlsx2txt.pl script to somewhere in your $PATH (ex: /usr/bin/xlsx2txt.pl)
Copy the xlsx2txt bash script to /var/www/dekiwiki/bin/filters
Make both scripts executable
Edit your /etc/dekiwiki/mindtouch.deki.startup.xml and add the following filter:
<filter-path extension="xlsx">/var/www/dekiwiki/bin/filters/xlsx2txt</filter-path>
| Images 0 | ||
|---|---|---|
| No images to display in the gallery. |
Copyright © 2011 MindTouch, Inc. Powered by