Thursday, March 10, 2011

Updating a Solr Index

Clearing out an existing Solr index when you want to update your schema is easy with curl.

curl http://localhost:8080/solr/update --data-binary '*:*' -H 'Content-type:text/xml; charset=utf-8'


*replace port 8080 with the port # Solr is running under.

Stop servlet container
Change the schema.xml
Start servlet container

In the admin tool run a *:* query for the results.

Tuesday, December 7, 2010

Plate Carree: Geoserver and ArcIMS Compatibility

Anyone trying to connect Geoserver to an ArcSDE dataset stored as (ESRI) EPSG: 54001 will quickly find that not all Plate Carree's are created equal. Geoserver does not like ESRI's choice of ellipsoid, which means tweaking the parameters slightly. Follow these simple steps to make EPSG:54001 operational in Geoserver.

1. Edit ../webapps/geoserver/data/user_projections/epsg.properties in your Tomcat context.

2. Add a new line at the end of the file and append the following text as 1 line. Syntax is critical.

54001= PROJCS["WGS 84 / Plate Carree", GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]], PROJECTION["Equidistant Cylindrical (Spherical)", AUTHORITY["EPSG","9823"]], PARAMETER["central_meridian", 0.0], PARAMETER["latitude_of_origin", 0.0], PARAMETER["standard_parallel_1", 0.0],PARAMETER["false_easting", 0.0],PARAMETER["false_northing", 0.0],UNIT["m", 1.0],AXIS["Easting",EAST],AXIS["Northing", NORTH],AUTHORITY["EPSG","54001"]]

3. Restart Geoserver.

4. In your Geoserver Admin page select "Demos" and click on the "SRS List" link.

5. Search for either "54001" or "Plate Carree" and view the results.

Your projection should be in this list. Remember to keep an eye out on the Geoserver log for errors.

Thursday, October 14, 2010

7,000+ Active Geoserver Layers?

In considering our objective of adding map data from remote sites via WMS it occurs to me that at some point such model could fail in certain situations. The example that sticks out is data set volumes. With 7,000+ data sets to publish at Harvard, there are bound to be circumstances where a remote WMS request will return empty data if a Geoserver instance has lost it's connection (or a coverage store becomes corrupt) to ArcSDE. In cases like this we should think about intercepting the request and if a particular data layer needs "correcting" in Geoserver, we should utilize the REST capabilities to correct connection issues.

Thursday, July 29, 2010

Use Java to Add ArcSDE Data Layers to Geoserver Using REST

I was scouring the web for examples of how to implement Geoserver's REST API in Java to add data layers dynamically. I was able to use curl to successfully add data layers but I wanted to make this functionality accessible via Java without having to use .exec() to do the work. What I found were some examples using the open source Jersey Reference Implementation for building RESTful Web services. A description of the project is here and the jars needed to run the following code are here.

A simple implementation without components to add metadata to fully describe the layer (to come later). The original code came from Jon Britton and was posted here.



An excellent alternative to this approach, and no dependencies is GSRCJ.

Monday, July 12, 2010

Exploring Solr

Over the weekend, I read some of "Solr 1.4 Enterprise Search Server". I learned about a few more features. When multiple keywords are provided, Solr does the right thing. Documents that hit or more words tend to get a higher score. The more rare a word is, the more a hit on it is worth. Solr provides built-in support for both stemming and paging.

I have defined a Solr schema and ingested some XML formatted data using curl (curl http://localhost:8983/solr/update -F stream.file=/tmp/sampleData.xml). Note that the absolute path of the file must be provided. The added data will not searchable until a commit is performed.

The data can be searched using the Solr admin tool or by providing a URL. The URL to search for all the data is http://localhost:8983/solr/select/?q=*%3A*, where %3A is the code for :.

The current schema is not at all complete. Layer coordinates are simply being stored as a bounding box where each coordinate is of type tdouble. We should probably consider ingesting the the XML meta data directly. However, one item lacking from the XML is document boost.

Friday, July 9, 2010

OpenLayers, Google Maps and versions

The current 2.9 OpenLayers release is not compatible with Google Maps v3. This issue is being worked on and there are patches for v3 compatibility. The details are at http://trac.openlayers.org/ticket/2493. v3 compatibility is slated for the OpenLayers 2.10 (which is the next release). It appears OpenLayers releases new versions roughly once a year. Since 2.9 was released in April 2010, we can't expect to receive the 2.10 release this summer. (You can follow the 2.10 release at http://trac.openlayers.org/wiki/Release/2.10.) Google Maps v2 has been deprecated. It may be available for only three more years. For OpenGeoPortal we can use Google Maps v3 with the already nearly complete v3 patches to 2.9. Or we can use Google Maps v2 with the standard 2.9 OpenLayers release. Then in 2011 or 2012 or so we can upgrade to 2.10 and v3.

Search Options

Search is a critical element for OpenGeoPortal. Results must be properly ranked, complete and returned quickly. There are two approaches we can take. The traditional solution uses SQL. The layer meta data is put into a relational database. SQL queries run against the table and the results are displayed. Often the SQL query, using "ORDER BY", ranks the layers and determines the order layers are displayed. Another potential approach is to use more modern search technology. There are open source solutions (Lucene) and Solr) we might integrate.

What are the advantages of using Solr/Lucene? It has built in support for GIS data include geodetic coordinates, geohashes, bounding boxes and spacial hierarchies. Distances can be calculated in several coordinate systems including euclidean, great circle and Manhattan. It has built in support for advanced search features. Synonyms and misspellings are added by putting them into a configuration file. Likewise, words to ignore can be added by editing another configuration file. Ranking supports different weights on both specific layers and individual meta data fields. Weights are modified via configuration files, not changing code. Results can be both ranked and grouped. Results are available in multiple formats including XML and JSON.

The biggest disadvantage is somebody has to learn a lot about Solr and write some ingest code. I think I'm up for that.

Grant Ingersoll wrote a nice paper discussing using Solr with GIS data.

A few parting comments. First, people building high-end search solutions today don't look to SQL like they used to. Search solutions often include data repositories optimized for search, not relying on legacy data stores designed for transaction based read/write operations. That makes me wonder if we should build our search solution based on a SQL database. Second, I think the days are numbered for people creating their own search solution. Maybe we're not quite there yet, but as a function of time, programmers will increasing rely on integrating existing search solutions. It happened for hashtables and data repositories, it is now happening for search.

What do you think? Should we consider it? Is the technology mature enough? Does its Java/Tomcat infrastructure make it easy enough for us to deal with and integrate? Do we envision an search that relies on ESRI SDE that could not be replicated in Solr?