Search is a critical element for OpenGeoPortal. Results must be properly ranked, complete and returned quickly. There are two approaches we can take. The traditional solution uses SQL. The layer meta data is put into a relational database. SQL queries run against the table and the results are displayed. Often the SQL query, using "ORDER BY", ranks the layers and determines the order layers are displayed. Another potential approach is to use more modern search technology. There are open source solutions (
Lucene) and
Solr) we might integrate.
What are the advantages of using Solr/Lucene? It has built in support for GIS data include geodetic coordinates, geohashes, bounding boxes and spacial hierarchies. Distances can be calculated in several coordinate systems including euclidean, great circle and Manhattan. It has built in support for advanced search features. Synonyms and misspellings are added by putting them into a configuration file. Likewise, words to ignore can be added by editing another configuration file. Ranking supports different weights on both specific layers and individual meta data fields. Weights are modified via configuration files, not changing code. Results can be both ranked and grouped. Results are available in multiple formats including XML and JSON.
The biggest disadvantage is somebody has to learn a lot about Solr and write some ingest code. I think I'm up for that.
Grant Ingersoll wrote a nice
paper discussing using Solr with GIS data.
A few parting comments. First, people building high-end search solutions today don't look to SQL like they used to. Search solutions often include data repositories optimized for search, not relying on legacy data stores designed for transaction based read/write operations. That makes me wonder if we should build our search solution based on a SQL database. Second, I think the days are numbered for people creating their own search solution. Maybe we're not quite there yet, but as a function of time, programmers will increasing rely on integrating existing search solutions. It happened for hashtables and data repositories, it is now happening for search.
What do you think? Should we consider it? Is the technology mature enough? Does its Java/Tomcat infrastructure make it easy enough for us to deal with and integrate? Do we envision an search that relies on ESRI SDE that could not be replicated in Solr?