linkdb problem

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

linkdb problem

ramires
hi i got latest nutch release trough the svn.. i crawled and indexed some sites without problem. When i tried to extract links into the linkdb i saw that these lines in

cat linkdb/current/part-00000/data  
SEQorg.apache.hadoop.io.Textorg.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
cat /linkdb/current/part-00000/index
SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#
Reply | Threaded
Open this post in threaded view
|

Re: linkdb problem

Dennis Kubes-2
You are showing the cat output of linkdb which is composed of binary
files.  What is your problem?

Uygar BAYAR wrote:

> hi i got latest nutch release trough the svn.. i crawled and indexed some
> sites without problem. When i tried to extract links into the linkdb i saw
> that these lines in
>
> cat linkdb/current/part-00000/data  
> SEQorg.apache.hadoop.io.Text
> org.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
> cat /linkdb/current/part-00000/index
> SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#
>
Reply | Threaded
Open this post in threaded view
|

Re: linkdb problem

ramires
problem is when i try to  ../bin/nutch readlinkdb ready/otomotiv/linkdb/ -dump alo there is nothing in the cat alo/part-00000
i fetched 20.000 urls without problem..

Dennis Kubes-2 wrote
You are showing the cat output of linkdb which is composed of binary
files.  What is your problem?

Uygar BAYAR wrote:
> hi i got latest nutch release trough the svn.. i crawled and indexed some
> sites without problem. When i tried to extract links into the linkdb i saw
> that these lines in
>
> cat linkdb/current/part-00000/data  
> SEQorg.apache.hadoop.io.Text
> org.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
> cat /linkdb/current/part-00000/index
> SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#
>