files are not generated in index folder by indexer for the site http://www.traguiden.se(for other sites its working good) while crwaling

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

files are not generated in index folder by indexer for the site http://www.traguiden.se(for other sites its working good) while crwaling

patil-2
Hello,

please help me... am using nutch-0.9 with lucene 2.2 and Hadoop 0.15.0

I have commented the line dedup, so am able crwal the site http://www.traguiden.se(but for other sites its working properly...), but not indexing properly. If i uncomment the line dedup am getting below exception.

Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)

only file by name segments are creating in index folder. but failing to generate below files...

please help out... am not able to generate some files under index folder... when i crwal a site...

i need to generate below files... please help... tried nearly a week.. to solve.


_0.fdt
_0.tis
_0.fdx
_0.prx
._0.fdt.crc
_0.tii
._0.fdx.crc
_0.nrm
._0.fnm.crc
._0.frq.crc
_0.fnm
_0.frq
._0.nrm.crc
._0.tii.crc
._0.tis.crc
._0.prx.crc

response as a solution is appreciated.

Thanks
S Patil.