Search Experience

    Introduction

    The preponderance of search bars in everything from browsers, sites, applications and OS level features means that search is increasingly the interface most comfortable for people looking for information, surpassing even the best hierarchical taxonomies created by content curators. Being one of the first user experiences many users will have with MindTouch, the quality of its search reflects greatly on the users opinion of the product.

    The current search experience is based on reporting query results from the Lucene engine with minor filtering for permissions. While a very good text indexing engine, lucene's results are primarily based on frequency and prominence of keywords. However, there are a number of additional data points that should be considered for improving the selection and ranking of search results that draw from analytical knowledge about user behavior on MindTouch installation. This revision of the search experience aims to improve both quality and speed of results.

    Intended Audience

    All users trying to find content in MindTouch.

    Additional information

     

    References

    Status

    Scoping by ArneC & RoyK

    Functional Specification

    Use Cases

    This change is behind changes to improve the results of searches and does not introduce any new use cases

    Non-goals

    Technical Specification

    UI requirements

    Chrome elements

    • Search input (top and bottom)
    • Namespace filter (all pages, main namespace, my user namespace)
    • Enhanced search add-on module entrypoint link
    • Sorting:
      • Pure Rating
      • Date Created
      • Date Modified
    • Language filter (if polyglot-enabled)
    • RSS feed link
    • Pagination
    • Search query execution string
    • Promoted results*

    Result types

    • Page
    • Comment
    • File/Image

    Search Result Elements

    Pages
    • Display title (linked)
    • Parent page path
    • Page URL
    • Date modified
    • Date created
    • Word count
    • Content rating
    • Content preview
    • Comment count (?)
    Files/Images
    • File (linked)
    • Parent page
      • Display title
      • Full path
    • File URL
    • File preview text (if available)
    • Date modified
    • Date created
    Comments
    • Comment
    • Comment #
    • Parent page
      • Display title
      • Full path
    • Comment author
    • Comment modified
    • Comment created (necessary?)

    search_experience.png

    API requirements

    Both the Core Search API implementation and the Lucene query features will need to be changed

    Lucene API

    Changes to indexing

    • Stop including content in index
    • Index whether pages are private

    Condensed search result feature

    The condensed set will only information essential for ordering and retrieving original content. Likely candidates are:

    • page_id
    • content_type
    • score
    • title
    • creation date
    • modification date
    • private flag

    This search feature will be only for documents that relate to pages (i.e. pages, comments, files), while the existing search feature will continue to exist for querying other types of documents, such as users.

    Condensed result will have a much higher max result limit, since order can be significantly affected post lucene. Assuming 500 max for now, but need to review caching memory consumption. Also need to review likelyhood of lucene returning a set that size and whether a cut-off lucene score should be used to limit rather than a hard number of entries.

    Core API

    The core API will use the condensed results to filter the private pages by the requesting user and use a new Search Ranking formula to rank the results and store the result meta-data set in a per-user cache. As the user pages through results, the cache is used to retrieve the page meta data for the current result page and augment it with page content.

    Assumption:

    • There either exists or will exist a caching feature in Dream or Deki that can be transparently swapped between an in-process and memcache store.
    Tag page

    Files 1

    FileVersionSizeModified 
    Viewing 4 of 4 comments: view all
    What about user promoted results ? :
    - explicit link to promote a result item in the result list
    - track clicked result items to adjust the score of the item ?
    - after a period of time, old user scoring should be dropped to avoid old documents being always scored better
    Posted 01:05, 8 Apr 2010
    Better file handling:
    - extract document title instead of the filename as title of the result item
    - teh file full URL isn't useful for the user : filename and URL of the page where the file has been attached (or pages where the file is accessible / present / linked ?)
    Posted 01:10, 8 Apr 2010
    I agree with @openseo on the last point: a URL for a file result adds nothing to the experience. Having the URL for the page that the file is attached to would help recognition that the file is relevant.
    Posted 11:48, 10 Jun 2010
    @openseo search results are now ranked by their popularity in searches
    Posted 14:17, 10 Jun 2010
    Viewing 4 of 4 comments: view all
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by