Pylucene- Part I: Creating index

How to write a simple index generator with pylucene

  1 import lucene
  3 if __name__ == '__main__':
  4     INDEX_DIR = "/home/kanaujia/lucene_index"
  6     # Initialize lucene and JVM
  7     lucene.initVM()
  9     print "lucene version is:", lucene.VERSION
 11     # Get the analyzer
 12     analyzer = lucene.StandardAnalyzer(lucene.Version.LUCENE_CURRENT)
 14     # Get index storage
 15     store = lucene.SimpleFSDirectory(lucene.File(INDEX_DIR))
 17     # Get index writer
 18     writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
 20     try:
 21         # create a document that would we added to the index
 22         doc = lucene.Document()
 24         # Add a field to this document
 25         field = lucene.Field("title", "India", lucene.Field.Store.YES, lucene.Field.Index.ANALYZED)
 27         # Add this field to the document
 28         doc.add(field)
 30         # Add the document to the index
 31         writer.addDocument(doc)
 32     except Exception, e:
 33         print "Failed in indexDocs:", e


  • An index is created with an IndexWriter
  • An index is a collection of documents
  • A document represents a file, or data in terms of fields
  • A field is a tuple of field name, data

Let’s understand the above program:

  1. We provide a location of index as INDEX_DIR = “/home/kanaujia/lucene_index”
  2. Start and initialize the Java VM
  3. Get the lucene’s standard analyzer for fields
  4. This example keeps the index on disk, so the SimpleFSDirectory class is used to get a handle to this index.
  5. IndexWriter creates and maintains an index. The constructor is as follows:

IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)

  • Directory is handle to index location
  • ‘create’ tells if a new index object is created for every user request
# Get index writer
    writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
  • Create a document that would become part in the index
  • Create a field, add it to a document.
  • Add the document to the index.
  • Run the program
kanaujia@ubuntu:~/work/Py$ python
lucene version is: 3.6.1
kanaujia@ubuntu:~/work/Py$ ls /home/kanaujia/lucene_index/
_0.fdt  _0.fdx  write.lock

A voyager on the journey to technology and art of software development. Pursuing arts, music, photography, and ways to live life on the edge

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.