Page properties have been available in the core product since the Lyons(9.02) release however there is no mechanism to search through these values. Expanding the MindTouch API to index and search property content allows expansion of current features and capabilities while also opening the door to new applications to be built.
Developers, developers, developers! ... and some UI components for the end user.
Initial implementation included with 9.12.0.
Spec explains current implementation and short term improvements
Users are able to save page metadata into page properties and then use search to find pages with metadata matching a variety of supported criteria.
Only properties of type text/* are supported.
Use page properties like tags
Tags are the easiest way to aggregate a bunch of pages together however these are easily editable by users viewing the pages. Aggregating pages in a tag-like manner via page properties would disallow users from altering the properties. It would also hide the aggregation mechanisms from plain site.
Find pages that have a property named 'sales' with this query:
property:#sales
Users can locate all pages that have a certain property with a known value with the following query:
#reviewedby:"maxm"
A property can have 0, 1, or more text values separated by a delimiter. Finding a page that has this property that contains at least one known value can be done like this:
#reviewedby:maxm
Note that there are no quotes around maxm indicating that it's a token that must exist within the property value but the value may contain other tokens.
Boolean operators can be combined to perform complex searches of page properties.
+#reviewedby:maxm -#reviewedby:royk +#status:"approved"
This can also be rewritten as
#reviewedby:(+maxm -royk) AND #status:"approved"
Range searches are possible to find pages with a property that is within a certain minimum or maximum bound.
#rating:[3 TO 5]
This will match for all pages that have a 'rating' property with values 3,4, and 5. Range searches are done lexicographically (in dictionary order) so for properties that are intended for numeric range searches, you'll need to pre-pad the numbers for a certain number of digits with 0's. Most significant numbers come first.
Another application of a range search is chronotagging. Pages describing events that happen at on a given date can be found with a range search like this:
#event_date:[20091225 TO 20100125]
This will find pages with a property 'event_date' that represents December 25, 2009 to January 25th, 2010.
Users can be found based on their user properties in the same ways as pages.
Only text/plain properties are searchable.
No UI changes are needed in order to support page property search. User property search requires search results to render a user result differently than a page result.
As with pages, attachments, comments, and tags, properties are indexed by Lucene. This allows property queries to be combined with existing queries for many powerful possibilities. Since properties are essentially key/value pairs associated with a resource, they mesh perfectly with Lucene which already treats all resources as a set of key/value pairs.
Refer to Lucene's Query Syntax guide for examples.
Only contents of properties with a Content-Type of text/plain are indexed. Indexing structured (xml, json, etc.) or binary data (image, octet-stream, etc.) is not useful since it can't be searched for using Lucene. In the future, additional MIME types may be supported such as application/xml, csv, json, etc with indexing being done by custom logic for specific types of documents.
Although properties are not meant to replace tags, it is possible to find resources based on their association with a given property name independent of the value. All property names of a resource are added to a Lucene field "property" to allow this behavior. For example, property:(+#foo -#bar) will find all pages with a property named foo but without bar.
Lucene indexes documents by a set of named fields. Pages and files already have a set of fields defined and although Lucene allows multiple duplicate key names for a document, it's important not to allow existing key names to be added on to by properties and their values. This is to avoid false positives from being introduced into search results when querying by a certain field such as 'author' .
Only 'custom' properties are added to the Lucene index (properties names starting with urn:custom.mindtouch.com#). These properties are visible and modifiable via the UI (except the namespace isn't displayed). The namespace prefix is removed from the property name so the indexed field name is '#' followed by the name. So a property with the full name of urn:custom.mindtouch.com#foo is indexed in Lucene as #foo. This avoids overloading existing fields with values from properties.
The Lucene service listens for page change notifications such as those triggered by page property changes. Specifically the channel
event://*/deki/pages/dependentschanged/properties/*
is subscribed to by the Lucene service in order to know when a page needs indexing due to a property change.
TODO: define/lookup notification channel for user property updates
Currently delimiters are any whitespace as well as the comma character. This means that all other punctuation is considered as part of the token. It's likely that this list of non-token characters will be expanded.
Since Lucene tokenizes strings as described above, it's possible to find resources by one or more of the tokens that a certain property may have. For example if you want to store the results of a multi-select box into properties and find resources that contain (or that don't contain) one or more value, you can as long as each value is a token. This currently means that each token must be delimited by whitespace or a comma.
Just like pages and files are treated in Lucene as resources, users must become resources as well. This allows user properties to be associated with them and for users to be located based on their properties which contain personal information. As with pages, this will include only include custom user properties.
Since custom user properties may contain publicly accessed personal information that is indexed, it makes sense for the information to be publicly visible as well. Users with global READ access will have the permissions to see custom user properties of other users. This would allow storage of private information that only you can edit while allowing other properties to be seen and referenced by others (and by applications).
May be added to the index at a later date
| Images 0 | ||
|---|---|---|
| No images to display in the gallery. |
Copyright © 2011 MindTouch, Inc. Powered by
A DekiScript function could be used to store generic DS variables in the format used for compound properties.
If this should be a separate spec, I can do that but it's pretty tightly coupled to this one.
owner:"timothy.high"
The following searches work:
#owner:"timothy.high"
#owner:timothy.high
But not the following:
#owner:tim
#owner:high
#owner:(tim)
After some serious playing around, I found out that the following works:
#owner:tim*
#owner:(tim*)
But none of the following:
#owner:*tim*
#owner:(*tim*)
#owner:*high
#owner:(*high)
What's the deal with the wildcards, and how can I do a partial match on anything but the beginning of the value?? edited 06:17, 20 Jul 2010