Created by the Apache Lucene project, Solr is a speedy, open source enterprise search platform written in Java.
It runs as a standalone full-text search server within any servlet container. Solr already powers the search and navigation features of many of the world’s largest Internet sites.
Major features include:
- Full-text search
- Hit highlighting
- Faceted search
- Dynamic clustering
- Database integration
- Rich document (e.g., Word, PDF) handling
- Geospatial search
Solr uses the Lucene Java search library at its core for full-text indexing and search, and is it easy to use within virtually any programming language. It’s also fully customisable with an extensive range of advanced plug-in architecture.
Facets allow drilling down through large numbers of items. For example, an estate agent’s site may need a search facility that takes in price, region and property types. Facets would allow the user to select variables of all these, thus returning results that match all the chosen criteria. It is also possible for multiple facets to be selected for wider ranging and more comprehensive results.
Full Text Searching
Where there are large volumes of text, full text searching enables it to be scanned to locate specified keywords.
To continue with the property site example, there may be many variations on the word ‘house’ – e.g. houses, housing, house or housed. Stemming reduces them to the root word house. This ensures all variations are picked up by the search as a match – making for a useful facility.
These allow different fields to be weighted. For example, a short description field would have a higher weighting than a more detailed long description field. So if a search term exists in the short description, it would score more, as there would likely be a higher keyword density.
Rich Document Handling
This allows the association of a document’s contents with a particular result. To go back to the property site example, a pdf document containing content not already in the database could be read as a new field and searched like any other field.