So we are here in Austria, sprinting on Zope and Plone (thanks to
Lovely Systems). I have proposed a task on building an alternative
indexer system for Plone. So, we worked with Dokai and Tom on this.
Those guys rock, really !
Our goal was to create a plone 3 buildout that provides an out of the
box solution.
Background
Let me give you some background about indexing in Zope before
presenting our work. The default indexing system is quite effective, as
long as your instance is not getting too big. Some years ago, we had to
create an alternative indexer for CPS at Nuxeo, that would externalize
the catalog because we figured out that :
- 50% of the size of the ZODB was the catalog (I am talking about
gigas here)
- 50% of the time on object creation was taken by indexing tasks, and
was getting quite slow as the instance was growing.
Those values are approximate, but quite near the reality back then (I
know some people worked on making indexing better on Zope lately).
Julien then wrote a XML-RPC server that would take care of the indexing
tasks and reply to queries. The software behind it was Lucene, together
with PyLucene. The overall solution was quite good, beside the pain
we had to install it on some specific Linux back then.
Anyway. What did Julien some years ago exists now and is called
Solr. I also had some experiences a while ago with Xapian (as
Sidnei did too), which is quite efficient too, and easier to use from
Python (see here)
Solr, Xapian
So the first task to do was to decide what to use. I called Alan from
Enfold Systems because the guys over there have been working on the
topic for years.
As a matter of fact, they have created a package for Python that bind a
Solr server.
They also have a Plone integration that provides an utility to index
content on Solr.
Since the guys are releasing all of this very soon as open source, we
decided
to go with this solution for the sprint.
It is not a technological choice (Lucene) because Alan and some guys
from
Lemur are actually considering a drop-in replacement for Solr based on
Xapian.
In other words, the work done will be compatible with both Lucene and
Xapian technologies. Xapian is pretty interesting since it avoids
deploying Java ;)
The sprint task
The task was quite "simple" since the Enfold guys did all the hard work
:)
So we worked on:
1. a buildout that builds a Solr server and launches it
2. a Plone integration to use Solr seamlessly
The buildout
The buildout done and usable (We tried it under Windows, MacOSX and
Debian)
It uses new recipe we wrote:
- collective.recipe.ant : build Java softwares using ant
- collective.recipe.solrinstance : builds a Solr instance and
provide a script to launch it
If you want to try it, here's (roughly) how (comment the blog entry in
case of a problem)
$ svn co https://svn.enfoldsystems.com/public/enfold.solr/branches/snowsprint08-buildout buildout $ cd buildout/plone-3.0.5/ $ python2.4 bootstrap.py $ bin/buildout -v $ bin/solr-instance & <-- launches solr (python bin\solr-instance under Windows $ bin/instance fg <-- launches Zope
Then, on Zope, install SolrIntegration in the quick_installer. The
next document you will publish will be indexed on Solr side, and
searchable with the search box.
The portal_catalog remains though, so it is indexed twice ;) you can
empty it to check
Solr is acting right.
Plone integration
The last part we need to work on is to make the SearchableText index
100% Solr based. Whit advices us to create a storage for TextIndexNG so
that's where we are heading on (should be done tomorrow hopefully)
We would also like to do some benchmarks to compare the speed and ZODB
size. We will
probably use Jmeter for this.
I would like to thank Alan, Leonardo, Sidnei for their work on this area, and for releasing it as open source: I really believe that it will become a great indexing solution for Plone in the next months. I was really waiting for this momentum in indexing in the Plone community.
Comments !