For the last week or so I've been wrapping my brain around Solr and Solango. The whole time that I've been doing this I've had the feeling that they can do awesome, powerful things but they're documentation is so poor that I couldn't figure anything out beyond the basic examples. Ultimately I had to dig through a bunch of code and do some experimentation. Now that I've finally figured out how to do what I've been trying to do and have wrapped my brain around some of the trickier bits I'm going to share some of the gotchas and solutions I've found.
This is one thing which is reasonably well documented for Solango. Getting solr installed isn't the easiest process for anybody who isn't a Java dev but there's also only so much one can do to help the inexperienced deal with the finicky creature which is Tomcat. To that end here is the first bunch of tips related to install/Tomcat:
Once your up and running you will need to setup your search documents and get Solango talking to Solr. Again in this area the project documentation is pretty decent but here are my relevant tips:
Now you have some simple indexing and searching going, you probably want to do some more interesting. This is where nearly all of the documentation out there either doesn't exist or boils down to the eternally useful Javadoc style. From here on I'll break things down by the more advanced topic I dealt with.
Facets are more or less a way to filter the results of a search by certain document attributes. The best resource I've found for reading facets was this article. To be honest I had to read that a few times and play with thing before I really understood what was going on. Once you understand what they are, you can do some pretty neat things with facets. Here are my realizations with facets:
Even if you think you want nested facets, you probably don't really need nested facets. Read this again and think about it for a while. If you decide that you really do want nested facets here's what you will need to know:
Frequently you will need to index documents which have ManyToMany or ForegnKey which you would like to search or facet by. This is problematic at first glance since solr doesn't support anything like a join. The solution however comes in the form of a bit of de-normalization called multi-value fields. All solr field types take an optional "multivalued" property. If this property is true you are able to provide Solr with multiple field entries in your data XML, then it will use all of the provided values when searching or faceting. Solr does not do anything to munge the values together into a CSV or anything like that, this way you will see the discrete values in your search results or while you are viewing your solr indexes. Here are the gotchas I've found with multivalued fields:
Hopefully this helps you avoid some pain and suffering. Given a bit more time/energy I might even work towards turning these into a bit more formal documentation.
If you find any better solutions or have any questions bring them up in the comments!
Getting Started
This is one thing which is reasonably well documented for Solango. Getting solr installed isn't the easiest process for anybody who isn't a Java dev but there's also only so much one can do to help the inexperienced deal with the finicky creature which is Tomcat. To that end here is the first bunch of tips related to install/Tomcat:
- Don't try to use Tomcat installed by a package manager - Unless your a Java person, this will almost certainly install tomcat with a bunch of non-stock configs that you will spend a huge amount of time adjusting to get Solr to work.
- Be mindful of your working directory while starting tomcat - If you don't install tomcat via a package manager you probably will start and stop it with the `startup.sh` an `shutdown.sh` scripts which come with tomcat. This generally works fine but be aware, they are very sensitive to your working directory. If you start tomcat from your home directory, then all of your solr data files will end up in your home directory. Somewhat obviously if your not consistent about where you start Tomcat from, things are going to get very confused and very broken, very quickly.
Once your up and running you will need to setup your search documents and get Solango talking to Solr. Again in this area the project documentation is pretty decent but here are my relevant tips:
- Make sure that your SOLR_ROOT and related settings are set to the solr child directory of what ever your working path was when you started Tomcat. If you don't, things will not be happy.
- Know that all the SearchDocument classes you create, get aggregated into a common document. This is mentioned in the Solango documentation and makes sense since in the end your doing a search across your entire site but it can lead to some confusing errors/problems. Just keep this in mind while designing your documents and debugging. Particularly be careful of using common field names across SearchDocument classes with different definitions.
Now you have some simple indexing and searching going, you probably want to do some more interesting. This is where nearly all of the documentation out there either doesn't exist or boils down to the eternally useful Javadoc style. From here on I'll break things down by the more advanced topic I dealt with.
Facets
Facets are more or less a way to filter the results of a search by certain document attributes. The best resource I've found for reading facets was this article. To be honest I had to read that a few times and play with thing before I really understood what was going on. Once you understand what they are, you can do some pretty neat things with facets. Here are my realizations with facets:
- Read Faceted Search with Solr again, seriously.
- You will probably want to have facets operate on multiple properties. Sometimes you may be able to do so by simply faceting on each of the properties. Other times you will need to normalize the values of the various properties into a single new property and facet on that (especially if you want to do nested facets).
Nested Facets
Even if you think you want nested facets, you probably don't really need nested facets. Read this again and think about it for a while. If you decide that you really do want nested facets here's what you will need to know:
- The concept of nested facets is purely a Solango construct, Solr does not currently have any formal concept of nested Facets.
- Nested facets are referenced in the Solango documentation but are not actually described or explained. This means at some point you will probably need to dig into the Solango source to get them working.
- Since nested facets are purely a Solango concept, they work by populating a document field with entries which look like "parent value__child value__grand child value".
- You must generate these entires using a transform menthod on your SearchDocument classes.
- The separator ("__" in the example above) is defined by your settings file as FACET_SEPERATOR.
- The default setting from Solango for this seperator is not URL safe and will break things.
- If you want to have nice display versions of your nested facets you will need to patch solango. I have done so in my fork at GitHub. Eventually I'll try to contribute this back.
- If your documents need multiple facet property values (they probably do), you will need to use a multivalued field. Keep reading to learn about those.
Many Relationships
Frequently you will need to index documents which have ManyToMany or ForegnKey which you would like to search or facet by. This is problematic at first glance since solr doesn't support anything like a join. The solution however comes in the form of a bit of de-normalization called multi-value fields. All solr field types take an optional "multivalued" property. If this property is true you are able to provide Solr with multiple field entries in your data XML, then it will use all of the provided values when searching or faceting. Solr does not do anything to munge the values together into a CSV or anything like that, this way you will see the discrete values in your search results or while you are viewing your solr indexes. Here are the gotchas I've found with multivalued fields:
- Solango doesn't currently support them - You can set the "multivalued" property on your search documents and solango properly uses that information when generating a schema.xml file, it doesn't have any way to actually populate a multivalued field with multiple values. If you need multivalued fields you can checkout my fork of solango at GitHub. With my version of Solango you can return any interator from your transform methods and they will be handled properly. Eventually I'll work to get this merged back into mainline solango.
- Generally you need to store the literal string, numeric, or datetime value of your related fields in the solr index. Since you can't do joins anything but the literal value is generally pretty useless.
- Since Solango doesn't have any way to know where the data for related fields comes from beyond the transform method and since you are storing literal values, your Solr index won't automatically be kept up to date when you change related objects. To keep your index up to date you will need to either live with stale data for a while and periodically do a full reindex or setup code using signals to hook onto changes of related objects and update the appropriate solr documents.
Hopefully this helps you avoid some pain and suffering. Given a bit more time/energy I might even work towards turning these into a bit more formal documentation.
If you find any better solutions or have any questions bring them up in the comments!