Indexing XLSX Excel 2007 Documents

    The following describes how to configure a MindTouch VM to index Word 2007 (.xslx) documents.

    Install Prerequisite Perl packages

    cpan Spreadsheet::XLSX
    

    Create Filter script

    We will create two scripts to convert an xlsx file to plain text.  The first is a perl script (xlsx2txt.pl) which takes a filename on the command line and converts it to plain text.  The second is a bash wrapper script that reads from STDIN and creates a temporary file then invokes xlsx2txt.pl.

    xlsx2txt.pl

    #!/usr/bin/perl
    use Text::Iconv;
    my $converter = Text::Iconv -> new ("utf-8", "windows-1251");
    
    # Text::Iconv is not really required.
    # This can be any object with the convert method. Or nothing.
    
    use Spreadsheet::XLSX;
    
    my $file = $ARGV[0];
    if(!$file) {
            print("Usage: xlsx2txt filename.xlsx\n");
            exit(1);
    }
    if(! -e $file) {
            printf("File: %s not found\n", $file);
            exit(1);
    }
    
    my $excel = Spreadsheet::XLSX -> new ($file, $converter);
    foreach my $sheet (@{$excel -> {Worksheet}}) {
            $sheet -> {MaxRow} ||= $sheet -> {MinRow};
            foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) {
                    $sheet -> {MaxCol} ||= $sheet -> {MinCol};
                    foreach my $col ($sheet -> {MinCol} ..  $sheet -> {MaxCol}) {
                            my $cell = $sheet -> {Cells} [$row] [$col];
                            if ($cell) {
                                printf("%s\n", $cell -> {Val});
                            }
    
                    }
    
            }
    }
    

    xlsx2txt

    #!/bin/sh
    
    # Setting PATH
    PATH=$PATH:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
    export PATH
    
    TEMP=`mktemp`
    dd of=$TEMP > /dev/null 2>&1
    xlsx2txt.pl $TEMP 2> /dev/null
    rm $TEM
    

     

    Configure MindTouch startup.xml

    Copy the xlsx2txt.pl script to somewhere in your $PATH (ex: /usr/bin/xlsx2txt.pl)

    Copy the xlsx2txt bash script to /var/www/dekiwiki/bin/filters

    Make both scripts executable

    Edit your /etc/dekiwiki/mindtouch.deki.startup.xml and add the following filter:

     <filter-path extension="xlsx">/var/www/dekiwiki/bin/filters/xlsx2txt</filter-path>
    
    Tag page
    Viewing 1 of 1 comments: view all
    I had to "apt-get install make" for the current vm as make is not installed by default.
    Posted 06:24, 21 Jun 2011
    Viewing 1 of 1 comments: view all
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by