Edit index structure

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Edit index structure

Matthias W.
Hi,
is it possible to edit the index structure of nutch?

I have following problem:
The files will be indexed by Nutch, the frontend will be implemented with Zend Framework 1.6.0 (Zend_Search_Lucene).
Zend_Search_Lucene IMO doesn't support the nutch index structure, so I can only read the title, url, digest-code, tstamp, and score from the nutch index but I'm not able to read the digest itself or other fields.
Can I change the fields to be stored in the index? where?
Or are there other possibilities to solve this problem?

I've got an additional question concerning nutch (version 0.9):
Does nutch check the MIME-Type of files before indexing or check it only the extension of the files to get the matching parser?