The preponderance of search bars in everything from browsers, sites, applications and OS level features means that search is increasingly the interface most comfortable for people looking for information, surpassing even the best hierarchical taxonomies created by content curators. Being one of the first user experiences many users will have with MindTouch, the quality of its search reflects greatly on the users opinion of the product.
The current search experience is based on reporting query results from the Lucene engine with minor filtering for permissions. While a very good text indexing engine, lucene's results are primarily based on frequency and prominence of keywords. However, there are a number of additional data points that should be considered for improving the selection and ranking of search results that draw from analytical knowledge about user behavior on MindTouch installation. This revision of the search experience aims to improve both quality and speed of results.
All users trying to find content in MindTouch.
Scoping by ArneC & RoyK
This change is behind changes to improve the results of searches and does not introduce any new use cases
Pages
| Files/Images
| Comments
|

Both the Core Search API implementation and the Lucene query features will need to be changed
Changes to indexing
Condensed search result feature
The condensed set will only information essential for ordering and retrieving original content. Likely candidates are:
This search feature will be only for documents that relate to pages (i.e. pages, comments, files), while the existing search feature will continue to exist for querying other types of documents, such as users.
Condensed result will have a much higher max result limit, since order can be significantly affected post lucene. Assuming 500 max for now, but need to review caching memory consumption. Also need to review likelyhood of lucene returning a set that size and whether a cut-off lucene score should be used to limit rather than a hard number of entries.
Core API
The core API will use the condensed results to filter the private pages by the requesting user and use a new Search Ranking formula to rank the results and store the result meta-data set in a per-user cache. As the user pages through results, the cache is used to retrieve the page meta data for the current result page and augment it with page content.
Assumption:
| File | Version | Size | Modified | |
|---|---|---|---|---|
| ||||
| Images 1 | ||
|---|---|---|
added comment & file result setssearch_experience.png | ||
Copyright © 2011 MindTouch, Inc. Powered by
- explicit link to promote a result item in the result list
- track clicked result items to adjust the score of the item ?
- after a period of time, old user scoring should be dropped to avoid old documents being always scored better
- extract document title instead of the filename as title of the result item
- teh file full URL isn't useful for the user : filename and URL of the page where the file has been attached (or pages where the file is accessible / present / linked ?)