there is this advice to use lucene to index the same index file that nutch have created. But I thought that nutch is using a webdb to store the return crawl result? anyway from the threat mention above... why would one use lucene if nutch can perform all the local file system and web index and search function
Nutch builds on Lucene Java to provide web search application software.
if you wnat to develop your own java application that has integrated
indexing logic, using the "Lucene Java" *library* is probably right for
you. if you want an *application* which can crawl various types of
documents and builds a Lucene index, then Nutch is probably rightfor you.
even if you use Nutch, you can also use the underlying Lucene Java library
to directly access the indexes it builds.
: I am new to lucene and nutch.
: I am doing a project on an archiving web portal which allow individual user
: to index document (from file system) and to crawl website and RSS feed for
: Looking at the above requirement. I thought lucene is able to achieve it,
: however I found out that lucene does not have a crawler to crawl url.
: Then I look came across Nutch = perfect for my latter requirement to fetch
: website and RSS feed. I realise another thing from nutch it allow me to
: crawl my file system as well...
: Well then in my case, I guess I should be using API from nutch instead of
: >From another discussion on Nabble:
: http://www.nabble.com/Integration-of-Nutch-td12016441.html#a12040333 :
: there is this advice to use lucene to index the same index file that nutch
: have created. But I thought that nutch is using a webdb to store the return
: crawl result? anyway from the threat mention above... why would one use
: lucene if nutch can perform all the local file system and web index and
: search function