Monday, July 12, 2010

Exploring Solr

Over the weekend, I read some of "Solr 1.4 Enterprise Search Server". I learned about a few more features. When multiple keywords are provided, Solr does the right thing. Documents that hit or more words tend to get a higher score. The more rare a word is, the more a hit on it is worth. Solr provides built-in support for both stemming and paging.

I have defined a Solr schema and ingested some XML formatted data using curl (curl http://localhost:8983/solr/update -F stream.file=/tmp/sampleData.xml). Note that the absolute path of the file must be provided. The added data will not searchable until a commit is performed.

The data can be searched using the Solr admin tool or by providing a URL. The URL to search for all the data is http://localhost:8983/solr/select/?q=*%3A*, where %3A is the code for :.

The current schema is not at all complete. Layer coordinates are simply being stored as a bounding box where each coordinate is of type tdouble. We should probably consider ingesting the the XML meta data directly. However, one item lacking from the XML is document boost.

No comments:

Post a Comment