IO exception while adding field in Parsedata contentmeta.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

IO exception while adding field in Parsedata contentmeta.

Saurabh Suman
Hi
I am usinh Nutch-1.0. I want to add field in parseData parseMeta.  
In org.apache.nutch.parse.html.HtmlParser two fields are already added in original code.
                        metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding);
                        metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding);
i added third field
                      metadata.set(Metadata.AGE, "23");

in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key, Iterator<NutchWritable> values,
                     OutputCollector<Text, NutchDocument> output, Reporter reporter)
    throws IOException method
two fields are being added  in NutchDocument.

   NutchDocument doc = new NutchDocument();
    final Metadata metadata = parseData.getContentMeta();
 
    // add segment, used to map from merged index back to segment files
    doc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY));

    // add digest, used by dedup
    doc.add("digest", metadata.get(Nutch.SIGNATURE_KEY));
   

i added third field what i have set in HtmlParser like this.
  doc.add("age", parseData.getParseMeta().get("age"));

  By doing so , at indexing level i am getting exception as follow-

LinkDb: adding segment: file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527
LinkDb: done
Indexer: starting
   Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:152)


please tell me
(i)How to remove this exception?
(ii)how can i add new field in ParseData parseMeta?