Search Syntax and Parsers

    Introduction

    MindTouch Core Olympic will introduce the concept of different search parsers to generate the best queries for a given context. The initial set of parsers are:

    • BestGuess
    • Filename
    • Term
    • Lucene

    Intended Audience

    Most users will use the BestGuess, which will treat the input as a series of search terms, unless it encounters syntax that could be a lucene specific query syntax, in which case the entire query is treated as a lucene query.

    Users that want to prevent their input being interpreted as a lucene query or have their lucene query altered, should pick the appropriate parser.

    Additional information

     

    References

    Status

    Initial scoping by arnec

    Functional Specification

    The average user will use these parsers indirectly with the search feature build into the UI, which uses the BestGuess (or default) parser. In addition the image dialog in the editor will call search with the Filename parser. Using the opensearch, search or logged search API endpoints the parsers can be specified manually.

    Use Cases

    Non-goals

    Technical Specification

    Parsers

    BestGuess

    The BestGuess parser will try to treat the query as if the Term parser was specified, unless it encounters a syntax construct it does not understand, at which point it will change into a Lucene parser and pass the query through without modification.


    Terms split by whitespace (unless quoted). Constructs allowed for BestGuess Term parsing:

    • alphanumeric sequence
    • + and - inside a sequence (will be escaped)
    • * and ? (wildcards) anywhere in a sequence (will not be escaped)
    • any sequence surrounded by quotes (quotes in the term must be escaped with backslash)
    • leading + (will not be escaped)
    • leading - (will not be escaped)
    • field:term pairs (two sequences separated by a : ) for known fields. If a second : is found it will be escaped. The known fields are:
      • content
      • title
      • path.title
      • description
      • tag
      • comments
      • path
      • id.page
      • id.parent
      • id.file
      • path.parent
      • title.parent
      • title.page
      • author
      • #{propertyname}
      • {ns}#{propertyname}
      • filename
    • namespace is not recognized and should be specified per query in constraints or globally in the search/namespace-exclude config string

    Boost, range query, AND, NOT, OR, parentheses are some examples of constructs that will cause the parser to shortcircuit and switch to the Lucene parser.

    Filename

    The Filename parser will treat the input as a single term, escaping all special characters (including whitespace) and searching both the filename field with wildcard trailing the term and the extension field with the term value. On the indexing size, filenames are tokenized as a single term with all whitespace and dashes normalized to underscores. Some sample query translations are:

    jbg => filename:jpg* extension:jpg

    IMG-0123 => filename:IMG_0123* extension:IMG\-0123

    foo bar => filename:foo_bar* extension:foo\ bar

    Term

    The Term parser will escape anything that would be considered by lucene as special syntax and treat the query as a list of terms, building a lucene query per term based on the search/termquery formatting string

    Lucene

    The Lucene parser will pass the query unmodified through to lucene. This means that all field specifications need to be manually provided, since terms without a field prefix will only search the content field. Constraints are stil ANDed to the query, providing at the very least type filtering.

    Configuration Keys

    search/namespace-exclude

    Default: -namespace:"template" -namespace:"template_talk" -namespace:"help" -namespace:"help_talk"

    search/termquery

    Default: content:{0} title:{0}^4 path.title:{0}^4 description:{0}^3 tag:{0}^2 comments:{0}

    UI requirements

    API requirements

    Tag page
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by