Lucene internals
This article introduces Lucene internals
Apache Lucene is a java full-text search engine. Lucene provide its core library and API that can easily be used to add search capabilities to applications.
Concept
Documenta sequence of fieldsFielda named sequence of termsTerma stringInverted IndexThe index stores statistics about terms in order to make term-based search more efficient. The index can list, for a term the documents contain it which is the inverse of the natural relationship, in which documents list terms.
API
Lucene is divided into several packages:
analysisdefines an abstractAnalyzerAPI for converting text from aReaderinto aTokenStream. A TokenStream can be composed by applyingTokenFiltersto the output of aTokenizer.TokenizerandTokenFiltersare strung together and applied with anAnalyzer.analysis-commonprovides a number of Analyzer implementations.codecsprovides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.documentprovides a simpleDocumentclass. A document is a set of namedFields, whose values may be strings or instances ofReader.indexprovide two primary classes:IndexWriter, which creates and adds documents to indices; andIndexReaderwhich accesses data in the index.searchprovides data structures to represent queries(TermQueryfor individual words,PhraseQueryfor phrases,BooleanQueryfor boolean combinations of queries) and theIndexSearcherwhich turns queries intoTopDocs. A number ofQueryParsers are provided for producing query structures from strings or xml.storedefines an abstract class for storing persistent data, theDirectorywhich is a collection of named files written by anIndexOutputand read byIndexInput. Multiple implementations are provided, butFSDirectoryis generally recommended as it tries to use operation system disk buffer caches efficiently.
typical usage
- Create
Documentsby addingFields. - Create an
IndexWriterand add documents to it withaddDocument() - Call
QueryParser.parse()to build a query from a string - Create an
IndexSearcherand parse the query to itssearch()method.