|
Hi,
I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to start crawling according to the tutorial, I always get the following error: Injector: starting Injector: crawlDb: crawl2/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) at org.apache.nutch.crawl.Injector.inject(Injector.java:138) at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) ------------------------------------------------------------------------------------------------------------ From the log, I found a more detailed description which is: 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: crawl2/crawldb 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: urls 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0 at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java :76) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java :89) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) at org.apache.hadoop.mapred.LocalJobRunner$Job.run( LocalJobRunner.java:91) Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 at java.net.InetAddress.getLocalHost(InetAddress.java:1308) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java :73) ... 3 more At first I suspect that the error was caused by tomcat not running properly, but after doing some checking I am confirmed that tomcat is indeed running. Could somebody let me know what I might be doing wrong here? Cheers, |
|
For some reason the nutch process can't resolve the hosts. This could
be due to incorrect setup of dns on the machine or a firewall or proxy in place. See if you can ping one of the urls (hosts) that you are trying to fetch. Dennis Kubes Reza Harditya wrote: > Hi, > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to > start crawling according to the tutorial, I always get the following error: > > Injector: starting > Injector: crawlDb: crawl2/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > ------------------------------------------------------------------------------------------------------------ > > > From the log, I found a more detailed description which is: > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: > crawl2/crawldb > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: urls > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting > injected > urls to crawl db entries. > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0 > at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java > :76) > at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java > :89) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > LocalJobRunner.java:91) > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java > :73) > ... 3 more > > > At first I suspect that the error was caused by tomcat not running > properly, > but after doing some checking I am confirmed that tomcat is indeed running. > > Could somebody let me know what I might be doing wrong here? > > Cheers, > |
|
I have checked and confirmed that the hosts I'm trying to fetch are actually
accessible (ping requests and loading the site itself). However, I still get the same error. Any other alternatives? On 5/14/07, Dennis Kubes <[hidden email]> wrote: > > For some reason the nutch process can't resolve the hosts. This could > be due to incorrect setup of dns on the machine or a firewall or proxy > in place. See if you can ping one of the urls (hosts) that you are > trying to fetch. > > Dennis Kubes > > Reza Harditya wrote: > > Hi, > > > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to > > start crawling according to the tutorial, I always get the following > error: > > > > Injector: starting > > Injector: crawlDb: crawl2/crawldb > > Injector: urlDir: urls > > Injector: Converting injected urls to crawl db entries. > > Exception in thread "main" java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) > > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > > > ------------------------------------------------------------------------------------------------------------ > > > > > > From the log, I found a more detailed description which is: > > > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: > > crawl2/crawldb > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: urls > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting > > injected > > urls to crawl db entries. > > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: > dhcppc0 > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > SequenceFile.java > > :76) > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > SequenceFile.java > > :89) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) > > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > > LocalJobRunner.java:91) > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > SequenceFile.java > > :73) > > ... 3 more > > > > > > At first I suspect that the error was caused by tomcat not running > > properly, > > but after doing some checking I am confirmed that tomcat is indeed > running. > > > > Could somebody let me know what I might be doing wrong here? > > > > Cheers, > > > |
|
Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
at java.net.InetAddress.getLocalHost(InetAddress.java:1308) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java :73) Could it be that it is because I have an installation of apache and tomcat in the host that I've installed Nutch and it cannot determine whether 'localhost' points to the apache or tomcat? Or does it matter anyway? I have both servers(apache and tomcat) listening on the default port# which is 80 and 8080. On 5/14/07, Reza Harditya <[hidden email]> wrote: > > I have checked and confirmed that the hosts I'm trying to fetch are > actually accessible (ping requests and loading the site itself). However, I > still get the same error. > > Any other alternatives? > > > On 5/14/07, Dennis Kubes <[hidden email]> wrote: > > > > For some reason the nutch process can't resolve the hosts. This could > > be due to incorrect setup of dns on the machine or a firewall or proxy > > in place. See if you can ping one of the urls (hosts) that you are > > trying to fetch. > > > > Dennis Kubes > > > > Reza Harditya wrote: > > > Hi, > > > > > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted > > to > > > start crawling according to the tutorial, I always get the following > > error: > > > > > > Injector: starting > > > Injector: crawlDb: crawl2/crawldb > > > Injector: urlDir: urls > > > Injector: Converting injected urls to crawl db entries. > > > Exception in thread "main" java.io.IOException : Job failed! > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java > > :357) > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) > > > at org.apache.nutch.crawl.Crawl.main (Crawl.java:105) > > > > > ------------------------------------------------------------------------------------------------------------ > > > > > > > > > From the log, I found a more detailed description which is: > > > > > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting > > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: > > > crawl2/crawldb > > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: urls > > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting > > > injected > > > urls to crawl db entries. > > > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 > > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: > > dhcppc0 > > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > > SequenceFile.java > > > :76) > > > at org.apache.hadoop.io.SequenceFile$Writer .<init>( > > SequenceFile.java > > > :89) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) > > > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > > > LocalJobRunner.java:91) > > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > > SequenceFile.java > > > :73) > > > ... 3 more > > > > > > > > > At first I suspect that the error was caused by tomcat not running > > > properly, > > > but after doing some checking I am confirmed that tomcat is indeed > > running. > > > > > > Could somebody let me know what I might be doing wrong here? > > > > > > Cheers, > > > > > > > |
|
If dhcppc0 is the host that you are on you might want to check that your
hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0 is also pointing to 127.0.0.1. Something like this. 127.0.0.1 yourhost.domain.com yourhost localhost.localdomain localhost Dennis Kubes Reza Harditya wrote: > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java > :73) > > Could it be that it is because I have an installation of apache and tomcat > in the host that I've installed Nutch and it cannot determine whether > 'localhost' points to the apache or tomcat? Or does it matter anyway? > > I have both servers(apache and tomcat) listening on the default port# which > is 80 and 8080. > > > > > On 5/14/07, Reza Harditya <[hidden email]> wrote: >> >> I have checked and confirmed that the hosts I'm trying to fetch are >> actually accessible (ping requests and loading the site itself). >> However, I >> still get the same error. >> >> Any other alternatives? >> >> >> On 5/14/07, Dennis Kubes <[hidden email]> wrote: >> > >> > For some reason the nutch process can't resolve the hosts. This could >> > be due to incorrect setup of dns on the machine or a firewall or proxy >> > in place. See if you can ping one of the urls (hosts) that you are >> > trying to fetch. >> > >> > Dennis Kubes >> > >> > Reza Harditya wrote: >> > > Hi, >> > > >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted >> > to >> > > start crawling according to the tutorial, I always get the following >> > error: >> > > >> > > Injector: starting >> > > Injector: crawlDb: crawl2/crawldb >> > > Injector: urlDir: urls >> > > Injector: Converting injected urls to crawl db entries. >> > > Exception in thread "main" java.io.IOException : Job failed! >> > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java >> > :357) >> > > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) >> > > at org.apache.nutch.crawl.Crawl.main (Crawl.java:105) >> > > >> > >> ------------------------------------------------------------------------------------------------------------ >> >> > > >> > > >> > > From the log, I found a more detailed description which is: >> > > >> > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: >> > > crawl2/crawldb >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: urls >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting >> > > injected >> > > urls to crawl db entries. >> > > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 >> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: >> > dhcppc0 >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( >> > SequenceFile.java >> > > :76) >> > > at org.apache.hadoop.io.SequenceFile$Writer .<init>( >> > SequenceFile.java >> > > :89) >> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) >> > > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( >> > > LocalJobRunner.java:91) >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 >> > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( >> > SequenceFile.java >> > > :73) >> > > ... 3 more >> > > >> > > >> > > At first I suspect that the error was caused by tomcat not running >> > > properly, >> > > but after doing some checking I am confirmed that tomcat is indeed >> > running. >> > > >> > > Could somebody let me know what I might be doing wrong here? >> > > >> > > Cheers, >> > > >> > >> >> > |
|
Hi Dennis,
Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing to 127.0.0.1. And my hosts file is already looking like this: 127.0.0.1 loacalhost.localdomain localhost However, I don't quite follow what you mean with "127.0.0.1 yourhost.domain.com yourhost localhost.localdomain localhost". What should I put in yourhost.domain.com? Is it dhcppc0? Cheers, Reza On 5/14/07, Dennis Kubes <[hidden email]> wrote: > > If dhcppc0 is the host that you are on you might want to check that your > hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0 > is also pointing to 127.0.0.1. Something like this. > > 127.0.0.1 yourhost.domain.com yourhost > localhost.localdomain localhost > > Dennis Kubes > > Reza Harditya wrote: > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > SequenceFile.java > > :73) > > > > Could it be that it is because I have an installation of apache and > tomcat > > in the host that I've installed Nutch and it cannot determine whether > > 'localhost' points to the apache or tomcat? Or does it matter anyway? > > > > I have both servers(apache and tomcat) listening on the default port# > which > > is 80 and 8080. > > > > > > > > > > On 5/14/07, Reza Harditya <[hidden email]> wrote: > >> > >> I have checked and confirmed that the hosts I'm trying to fetch are > >> actually accessible (ping requests and loading the site itself). > >> However, I > >> still get the same error. > >> > >> Any other alternatives? > >> > >> > >> On 5/14/07, Dennis Kubes <[hidden email]> wrote: > >> > > >> > For some reason the nutch process can't resolve the hosts. This > could > >> > be due to incorrect setup of dns on the machine or a firewall or > proxy > >> > in place. See if you can ping one of the urls (hosts) that you are > >> > trying to fetch. > >> > > >> > Dennis Kubes > >> > > >> > Reza Harditya wrote: > >> > > Hi, > >> > > > >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I > wanted > >> > to > >> > > start crawling according to the tutorial, I always get the > following > >> > error: > >> > > > >> > > Injector: starting > >> > > Injector: crawlDb: crawl2/crawldb > >> > > Injector: urlDir: urls > >> > > Injector: Converting injected urls to crawl db entries. > >> > > Exception in thread "main" java.io.IOException : Job failed! > >> > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java > >> > :357) > >> > > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) > >> > > at org.apache.nutch.crawl.Crawl.main (Crawl.java:105) > >> > > > >> > > >> > ------------------------------------------------------------------------------------------------------------ > >> > >> > > > >> > > > >> > > From the log, I found a more detailed description which is: > >> > > > >> > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting > >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: > >> > > crawl2/crawldb > >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: > urls > >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: Converting > >> > > injected > >> > > urls to crawl db entries. > >> > > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 > >> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: > >> > dhcppc0 > >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > >> > SequenceFile.java > >> > > :76) > >> > > at org.apache.hadoop.io.SequenceFile$Writer .<init>( > >> > SequenceFile.java > >> > > :89) > >> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) > >> > > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > >> > > LocalJobRunner.java:91) > >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 > >> > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) > >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( > >> > SequenceFile.java > >> > > :73) > >> > > ... 3 more > >> > > > >> > > > >> > > At first I suspect that the error was caused by tomcat not running > >> > > properly, > >> > > but after doing some checking I am confirmed that tomcat is indeed > >> > running. > >> > > > >> > > Could somebody let me know what I might be doing wrong here? > >> > > > >> > > Cheers, > >> > > > >> > > >> > >> > > > |
|
It should look like this but change out domain for your domain. Try
this and let me know if it works. 127.0.0.1 dhcppc0.domain.com dhcppc0 localhost.localdomain localhost Dennis Kubes Reza Harditya wrote: > Hi Dennis, > > Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing > to 127.0.0.1. > And my hosts file is already looking like this: > 127.0.0.1 loacalhost.localdomain localhost > > However, I don't quite follow what you mean with "127.0.0.1 > yourhost.domain.com yourhost > localhost.localdomain localhost". What should I put in yourhost.domain.com? > Is it dhcppc0? > > Cheers, > > Reza > > > On 5/14/07, Dennis Kubes <[hidden email]> wrote: >> >> If dhcppc0 is the host that you are on you might want to check that your >> hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0 >> is also pointing to 127.0.0.1. Something like this. >> >> 127.0.0.1 yourhost.domain.com yourhost >> localhost.localdomain localhost >> >> Dennis Kubes >> >> Reza Harditya wrote: >> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 >> > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) >> > at org.apache.hadoop.io.SequenceFile$Writer.<init>( >> SequenceFile.java >> > :73) >> > >> > Could it be that it is because I have an installation of apache and >> tomcat >> > in the host that I've installed Nutch and it cannot determine whether >> > 'localhost' points to the apache or tomcat? Or does it matter anyway? >> > >> > I have both servers(apache and tomcat) listening on the default port# >> which >> > is 80 and 8080. >> > >> > >> > >> > >> > On 5/14/07, Reza Harditya <[hidden email]> wrote: >> >> >> >> I have checked and confirmed that the hosts I'm trying to fetch are >> >> actually accessible (ping requests and loading the site itself). >> >> However, I >> >> still get the same error. >> >> >> >> Any other alternatives? >> >> >> >> >> >> On 5/14/07, Dennis Kubes <[hidden email]> wrote: >> >> > >> >> > For some reason the nutch process can't resolve the hosts. This >> could >> >> > be due to incorrect setup of dns on the machine or a firewall or >> proxy >> >> > in place. See if you can ping one of the urls (hosts) that you are >> >> > trying to fetch. >> >> > >> >> > Dennis Kubes >> >> > >> >> > Reza Harditya wrote: >> >> > > Hi, >> >> > > >> >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I >> wanted >> >> > to >> >> > > start crawling according to the tutorial, I always get the >> following >> >> > error: >> >> > > >> >> > > Injector: starting >> >> > > Injector: crawlDb: crawl2/crawldb >> >> > > Injector: urlDir: urls >> >> > > Injector: Converting injected urls to crawl db entries. >> >> > > Exception in thread "main" java.io.IOException : Job failed! >> >> > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java >> >> > :357) >> >> > > at >> org.apache.nutch.crawl.Injector.inject(Injector.java:138) >> >> > > at org.apache.nutch.crawl.Crawl.main (Crawl.java:105) >> >> > > >> >> > >> >> >> ------------------------------------------------------------------------------------------------------------ >> >> >> >> >> > > >> >> > > >> >> > > From the log, I found a more detailed description which is: >> >> > > >> >> > > 2007-05-14 09:32:57,977 INFO crawl.Injector - Injector: starting >> >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: crawlDb: >> >> > > crawl2/crawldb >> >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: urlDir: >> urls >> >> > > 2007-05-14 09:32:57,978 INFO crawl.Injector - Injector: >> Converting >> >> > > injected >> >> > > urls to crawl db entries. >> >> > > 2007-05-14 09:32:58,908 WARN mapred.LocalJobRunner - job_lzlk81 >> >> > > java.lang.RuntimeException: java.net.UnknownHostException: >> dhcppc0: >> >> > dhcppc0 >> >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( >> >> > SequenceFile.java >> >> > > :76) >> >> > > at org.apache.hadoop.io.SequenceFile$Writer .<init>( >> >> > SequenceFile.java >> >> > > :89) >> >> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77) >> >> > > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( >> >> > > LocalJobRunner.java:91) >> >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0 >> >> > > at java.net.InetAddress.getLocalHost(InetAddress.java:1308) >> >> > > at org.apache.hadoop.io.SequenceFile$Writer.<init>( >> >> > SequenceFile.java >> >> > > :73) >> >> > > ... 3 more >> >> > > >> >> > > >> >> > > At first I suspect that the error was caused by tomcat not running >> >> > > properly, >> >> > > but after doing some checking I am confirmed that tomcat is indeed >> >> > running. >> >> > > >> >> > > Could somebody let me know what I might be doing wrong here? >> >> > > >> >> > > Cheers, >> >> > > >> >> > >> >> >> >> >> > >> > |
|
Thanks Dennis, Worked like a charm :)
Forgive me for running in tangent in this thread here, but I just don't understand from which crawl directory does the search engine fetch the search result from? I mean, let's say I ran the crawl from the root of Nutch installation and put the crawl result in a directory called 'my.crawl'. And I know that the search engine itself is fetching the search result from the 'crawl' directory under webapps when using the web interface. So how does the content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually for every crawl? Reza On 5/14/07, Dennis Kubes <[hidden email]> wrote: > > It should look like this but change out domain for your domain. Try > this and let me know if it works. > > 127.0.0.1 dhcppc0.domain.com dhcppc0 > localhost.localdomain localhost > > Dennis Kubes > > |
|
Hi,
On 5/15/07, Reza Harditya <[hidden email]> wrote: > Thanks Dennis, Worked like a charm :) > > Forgive me for running in tangent in this thread here, but I just don't > understand from which crawl directory does the search engine fetch the > search result from? > > I mean, let's say I ran the crawl from the root of Nutch installation and > put the crawl result in a directory called 'my.crawl'. And I know that the > search engine itself is fetching the search result from the 'crawl' > directory under webapps when using the web interface. So how does the > content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually > for every crawl? Check "searcher.dir" configuration setting. Your webapp reads this setting and fetches results from this directory. If it is a relative path, then it is relative to where you started your webapp. > > Reza > > > On 5/14/07, Dennis Kubes <[hidden email]> wrote: > > > > It should look like this but change out domain for your domain. Try > > this and let me know if it works. > > > > 127.0.0.1 dhcppc0.domain.com dhcppc0 > > localhost.localdomain localhost > > > > Dennis Kubes > > > > > -- Doğacan Güney |
| Powered by Nabble | Edit this page |
