Lucene Orphaned Attachments

    For some reason, it's possible for a file attachment to become orphaned and not be associated with a parent page.  I'm not exactly sure how this can occur, but it definitely happens from time to time.  If an attachment gets into this state and you rebuild your lucene index some bad data makes it's way into lucene.  When a search query matches the orphaned attachment, an exception is thrown and a confusing error message is displayed:

    Your search query <span class="deki-errorquery">jump</span> contains characters which need to be escaped. See <a href="http://wiki.developer.mindtouch.com/index.php?title=MindTouch_Deki/FAQ/Page_Management/How_do_I...Do_advanced_content_searches%3F">this FAQ</a> for more information.
    

     

    The steps to resolve this are as follows

    • identify orphaned attachment(s) (id.page lucene field is the empty string)
    • mark orphaned attachment(s) as deleted
    • rebuild index or fix existing index with deleteTerm.exe tool

    Identify Orhpaned Attachment

    In order to find the orphaned attachment, you'll need to open the index using the Luke tool. Once you've opened the index, click the Search tab.  Select the KeywordAnalyzer and enter id.page:"" in the seawrch box.

    luke1.png

    Take note of the id.file value for each result returned from search.  These files are orphaned and need to be manually marked as deleted in the database.

    Mark Orphaned Attachment as Deleted

    For each file, we need to run some sql queries to mark it as deleted.

    1) verify that the parent page doesn't exist.  In this example file_id is 1 (see previous step)

    select page_id, page_title from pages where page_id =
     (select resrev_parent_page_id from resources where res_id = 
       (select resource_id from resourcefilemap where file_id = 1)
     );

    2) mark the resource as deleted

    update resourcerevs set resrev_deleted=1 where resrev_res_id =
       (select resource_id from resourcefilemap where file_id = 1); 
    
    update resources set res_deleted=1 where res_id = (select resource_id from resourcefilemap where file_id = 1);
    

    Once this is done, you should be able to go to Control Panel >> Deleted Files and the orphaned file will show up in the deleted files list.

    Fix Lucene Index

    To remove the bad data from the index you have two options

    1) rebuild the index via Control Panel >> Cache Management

    or

    2) using the deleteTerm.exe tool

    TODO: document this

    Tag page

    Files 1

    FileVersionSizeModified 
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by