Adding Search with Lucene


What if you are making a new web application that needs to search through files instead of just web pages. Well, you could write your own solution or you could use an existing search engine like Lucene.

Below is an excerpt from the Step Three: Profit! post about using Lucene.

There are a number of considerations to make when adding search to your site. For instance, you can usually get by pretty well with just integrating Google search into your website. This is fast, easy, and doesn't require messing with your backend code at all.

However, this is not really what I want. I want to let users search for files, not web pages, and I want the results integrated nicely with everything else. For instance, it would be cool to use a search query as a radio playlist like you can do on Hype Machine. So I'll need to build my own search engine.

This is not really that hard to do. I would recommend you read some articles and then download Managing Gigabytes for Java. Those articles are by Tom from AudioGalaxy. You may remember AudioGalaxy as the best thing to happen and unhappen to music in my lifetime. I know do. More importantly, it was deliciously scalable and for the most part it was just a search engine. So don't go writing one without learning some tips from the best.

I'm sure that a little engineering and MG4J could produce a highly scalable search engine. However, I didn't really want to spend that much time on it, so I went with a higher level solution in the form of Lucene for Java. There is also a popular version for Python. I would recommend waiting a while if you're considering using Lucy (Lucene in C with Python and Ruby bindings) because I don't consider it mature. I'd also stay away from layers on top of Lucene like Solr because if you're looking for tools to make Lucene easier to use then you're missing the point that it's already easy to use.

You can read more about using Lucene as a web server here.

So, if you are looking for a good search engine that can search for files check out Lucene.

PHP users should also check out the zend port of lucene:

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre> <div> <blockquote> <object> <embed> <img> <param>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Copy the characters (respecting upper/lower case) from the image.