Using Lucene to index Meta-data from txt, html, PDF etc files.

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Using Lucene to index Meta-data from txt, html, PDF etc files.

Aditya Gollakota
Hi Guys,


Just wondering how you would go about indexing meta-data from files. I've
used the demo package IndexHTMLjava and have updated the
with the following:


DataInput input = new DataInputStream(new BufferedInputStream(new

Content content =;

Reader contentReader = new ArrayFile.Reader(new LocalFileSystem(null),new
File(f.getPath(), Content.DIR_NAME).toString(), null);



ParseData parseData =;

Metadata metadata = parseData.getContentMeta();


doc.add(new Field("keywords", metadata.KEYWORDS, Field.Store.YES,


I'm using the nutch-0.8.jar for the Metadata Class and have used the jars of
nutch to resolve any exceptions and also Lucene-2.0.0


While compiling this code, I'm getting the following error:


A record version mismatch occurred. Expecting v1, found v118.


Any help would be much appreciated.




Aditya Gollakota
Support Engineer | CustomWare Asia Pacific |
T: +61 2 9900 5742 | F: +61 2 9475 0100 | M: +61 405 033 951
E: [hidden email]