On Tue, Dec 16, 2008 at 2:09 AM, ayyanar
<[hidden email]> wrote:
> Hi, Kindly share your thoughts on Why lucene and why not SQL?
You have 200,000 text documents to search. You need to find all
documents that contain the words "baseball" and "pitchers". In SQL
you would say where (text like '%baseball%' and text like
'%pitchers%'), and the query could take a very long time, because that
kind of search cannot use a sql index for performance. In Lucene, it
would be able to very quickly find what documents mention those words,
because it has an index based on the individual words found. In
Lucene, you would also be able to say "baseball pitchers"~5 to find
just those documents where the words are close together (only 5 words
apart maximum). In SQL you cannot do a proximity search, even with a
sql full text index.
This becomes even more apparent the larger the document set gets. SQL
can search a small number of documents fairly well, but with very many
documents, it gets much slower. Lucene stays fast.
SQL is fairly useful for short text fields with limited contents, that
can be indexed. Lucene is good for bigger full texts and very many