I got something like this when try to run nutch in eclipe

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

I got something like this when try to run nutch in eclipe

nutnoob
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

what's the problem and how to solve?
(I just try nutch-nightly )
Reply | Threaded
Open this post in threaded view
|

A text-based search engine

Bui Quang Hung
Hi,

I am finding for a text-based search engine on the Web.

I have to do some experiments with HITS (Hypertext Induced Topic Seclection)
algorithm of J. Kleinberg. These experiments require a text-based search
engine to obtain web pages which include the inputted query terms. I think I
can not use current link-based search engine such as Google, Yahoo, MSN.
Furthermore, results outputted by these link-based search engines are too
good, I am afraid that I can not see the effectiveness of HITS algorithm by
using them.

If you know there is a text-based search engine on the Web, could you please
tell me.

Thank you very much in advanced.

Best regards,
Hung.




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date: 8/28/2006
 

Reply | Threaded
Open this post in threaded view
|

Re: I got something like this when try to run nutch in eclipe

Renaud Richardet-3
In reply to this post by nutnoob
Is Nutch working from the command line?

If you want to know what went wrong, look at the logs (logs/hadoop.log)

Also, there's some info in the wiki at
http://wiki.apache.org/nutch/RunNutchInEclipse

HTH,
Renaud


nutnoob wrote:

> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> topN = 50
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>
> what's the problem and how to solve?
> (I just try nutch-nightly )
>  

--
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com

Reply | Threaded
Open this post in threaded view
|

Re: I got something like this when try to run nutch in eclipe

Uroš Gruber-2
Renaud Richardet wrote:

> Is Nutch working from the command line?
>
> If you want to know what went wrong, look at the logs (logs/hadoop.log)
>
> Also, there's some info in the wiki at
> http://wiki.apache.org/nutch/RunNutchInEclipse
>
> HTH,
> Renaud
>
>
> nutnoob wrote:
>> crawl started in: crawl
>> rootUrlDir = urls
>> threads = 10
>> depth = 3
>> topN = 50
>> Injector: starting
>> Injector: crawlDb: crawl/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Exception in thread "main" java.io.IOException: Job failed!
>>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
>>     at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
>>
>> what's the problem and how to solve?
>> (I just try nutch-nightly )
Hi,

I think you forgot to set "plugin.folders" to path where you have plugins

Check the *plugin dir not found* section at

http://wiki.apache.org/nutch/RunNutchInEclipse

regards

Uros
>>  
>

Reply | Threaded
Open this post in threaded view
|

RE: A text-based search engine

Bui Quang Hung
In reply to this post by Bui Quang Hung
Hi,
I am afraid that my question is not clear.
My question is: Do you know any Web page repository on the Web which
satisfies the following two conditions:
- It contains at least 100 million pages.
- It provides a text-based ranking algorithm. We can obtain pages including
the query terms.
Thank you very much in advance.
Best regards,
-----Original Message-----
From: Bui Quang Hung [mailto:[hidden email]]
Sent: Tuesday, August 29, 2006 3:00 PM
To: [hidden email]
Subject: A text-based search engine

Hi,

I am finding for a text-based search engine on the Web.

I have to do some experiments with HITS (Hypertext Induced Topic Seclection)
algorithm of J. Kleinberg. These experiments require a text-based search
engine to obtain web pages which include the inputted query terms. I think I
can not use current link-based search engine such as Google, Yahoo, MSN.
Furthermore, results outputted by these link-based search engines are too
good, I am afraid that I can not see the effectiveness of HITS algorithm by
using them.

If you know there is a text-based search engine on the Web, could you please
tell me.

Thank you very much in advanced.

Best regards,
Hung.




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date: 8/28/2006
 

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date: 8/28/2006
 

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date: 8/28/2006
 

Reply | Threaded
Open this post in threaded view
|

RE: A text-based search engine

Vishal Shah-3
Hello Hung,

  I don't know any WWW search engine that does pure text-based ranking.
One way to satisfy your requirements using existing engines is to pick
up a few results from each page of search results (1-10, 11-20, ...
191-200). You also need to filter out the pages that don't contain the
terms, coz most search engines use the anchor text of pages, apart from
the actual content as well.

This way, you might get a good mix of good and bad results.

Regards,
-vishal.

-----Original Message-----
From: Bui Quang Hung [mailto:[hidden email]]
Sent: Wednesday, August 30, 2006 12:57 AM
To: [hidden email]
Subject: RE: A text-based search engine

Hi,
I am afraid that my question is not clear.
My question is: Do you know any Web page repository on the Web which
satisfies the following two conditions:
- It contains at least 100 million pages.
- It provides a text-based ranking algorithm. We can obtain pages
including
the query terms.
Thank you very much in advance.
Best regards,
-----Original Message-----
From: Bui Quang Hung [mailto:[hidden email]]
Sent: Tuesday, August 29, 2006 3:00 PM
To: [hidden email]
Subject: A text-based search engine

Hi,

I am finding for a text-based search engine on the Web.

I have to do some experiments with HITS (Hypertext Induced Topic
Seclection)
algorithm of J. Kleinberg. These experiments require a text-based search
engine to obtain web pages which include the inputted query terms. I
think I
can not use current link-based search engine such as Google, Yahoo, MSN.
Furthermore, results outputted by these link-based search engines are
too
good, I am afraid that I can not see the effectiveness of HITS algorithm
by
using them.

If you know there is a text-based search engine on the Web, could you
please
tell me.

Thank you very much in advanced.

Best regards,
Hung.




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date:
8/28/2006
 

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date:
8/28/2006
 

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.6/430 - Release Date:
8/28/2006