Tutorial followup - Nutch webapp not seeing stuff?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Tutorial followup - Nutch webapp not seeing stuff?

ohaya
Hi,

I'm still following the Tutorial, per earlier post, and I think that I've gotten past the earlier errors with the intranet crawl (it's still running), so I wanted to try to get the web app running.

I had Tomcat installed and working, so I deployed the nutch WAR file (I put it in <tomcat>/webapps), so I can see the nutch web app/search box when I got to http://myhost.com:8080/nutch-1.0.

But, it's not showing anything other than the search box.

I found this other tutorial:

http://zillionics.com/resources/articles/NutchGuideForDummies.htm

which says to modify the <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml to point to the nutch crawl directory.

The crawl command line that I used was:

bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log

so, I set the nutch-site.xml file to:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>searcher.dir</name>
    <value>/opt/nutch-1.0/crawl.test</value>

  </property>

</configuration>

I then bounced Tomcat, but I still only get the search box, with nothing else, when I access http://myhost.com:8080/nutch-1.0.

Is there something else that I need to do to get the nutch web app to be able to see the data from the crawl?

Thanks!

Jim
Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

ohaya
Hi,

I noticed that in <tomcat>/webapps/nutch-1.0/WEB-INF/classes, there was a crawl-urlfilters.txt, which still had MY.DOMAIN in it, so I tried changing the parameter to "*apache.org" to match what was in the same file in /opt/nutch-1.0.

But, even after that, and bouncing Tomcat, I get just the search box on the nutch webpage.

I also checked catalina.out, and I saw these lines:

INFO: Deploying web application archive nutch-1.0.war
2009-07-14 08:23:27,687 INFO  NutchBean - creating new bean
2009-07-14 08:23:28,273 INFO  SearchBean - opening indexes in /opt/nutch-1.0/crawl.test/indexes

So, I checked in /opt/nutch-1.0, and there is no /opt/nutch-1.0/indexes file :(...

I'm guessing that the nutch web app is not showing anything because that "indexes" file or directory is missing, but I guess the question is why is it missing?  

As mentioned below, I am (still) running the intranet crawl, and by now, it's created 3 segments under /opt/nutch-1.0/crawl.test, but there is "indexes".

What step am I missing?

Jim




---- [hidden email] wrote:

> Hi,
>
> I'm still following the Tutorial, per earlier post, and I think that I've gotten past the earlier errors with the intranet crawl (it's still running), so I wanted to try to get the web app running.
>
> I had Tomcat installed and working, so I deployed the nutch WAR file (I put it in <tomcat>/webapps), so I can see the nutch web app/search box when I got to http://myhost.com:8080/nutch-1.0.
>
> But, it's not showing anything other than the search box.
>
> I found this other tutorial:
>
> http://zillionics.com/resources/articles/NutchGuideForDummies.htm
>
> which says to modify the <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml to point to the nutch crawl directory.
>
> The crawl command line that I used was:
>
> bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
>
> so, I set the nutch-site.xml file to:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>   <property>
>     <name>searcher.dir</name>
>     <value>/opt/nutch-1.0/crawl.test</value>
>
>   </property>
>
> </configuration>
>
> I then bounced Tomcat, but I still only get the search box, with nothing else, when I access http://myhost.com:8080/nutch-1.0.
>
> Is there something else that I need to do to get the nutch web app to be able to see the data from the crawl?
>
> Thanks!
>
> Jim

Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

ohaya
Hi,

I think that there must've been something messed up.

I tried running a new crawl:

bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log

and I modified the nutch-site.xml file to point to crawl3.test directory.

Then, after I bounce Tomcat, I can search successfully.

However, I then wanted to do a crawl with more depth, so I did:

bin/nutch crawl urls -dir crawl4.test -depth 3 >& crawl4.log

When I ran this same crawl before, it took a REALLY long time, but this time, it finished pretty quickly (a couple of minutes), which I thought was strange.

I modified nutch-site.xml and bounced Tomcat, and now, when I search, I only get 0 results again.

So, it seems like, when I set depth to 3 on the crawl, something is not working?

Also, I thought that if I access the nutch web app while the crawl was in progress, that I'm suppose to get some kind of status page?

Instead, all I'm getting is the page with just the search box.

So, I'm wondering:  Does the nutch crawl leave some data somewhere, even when I use a different name for the "dir" parameter?  The reason I'm asking this is that the original crawl I used had "-dir crawl.test", and was "-depth 3", and it seems like running a "-depth 3" crawl, even with a different "-dir" is no longer working correctly?

Jim





---- [hidden email] wrote:

> Hi,
>
> I noticed that in <tomcat>/webapps/nutch-1.0/WEB-INF/classes, there was a crawl-urlfilters.txt, which still had MY.DOMAIN in it, so I tried changing the parameter to "*apache.org" to match what was in the same file in /opt/nutch-1.0.
>
> But, even after that, and bouncing Tomcat, I get just the search box on the nutch webpage.
>
> I also checked catalina.out, and I saw these lines:
>
> INFO: Deploying web application archive nutch-1.0.war
> 2009-07-14 08:23:27,687 INFO  NutchBean - creating new bean
> 2009-07-14 08:23:28,273 INFO  SearchBean - opening indexes in /opt/nutch-1.0/crawl.test/indexes
>
> So, I checked in /opt/nutch-1.0, and there is no /opt/nutch-1.0/indexes file :(...
>
> I'm guessing that the nutch web app is not showing anything because that "indexes" file or directory is missing, but I guess the question is why is it missing?  
>
> As mentioned below, I am (still) running the intranet crawl, and by now, it's created 3 segments under /opt/nutch-1.0/crawl.test, but there is "indexes".
>
> What step am I missing?
>
> Jim
>
>
>
>
> ---- [hidden email] wrote:
> > Hi,
> >
> > I'm still following the Tutorial, per earlier post, and I think that I've gotten past the earlier errors with the intranet crawl (it's still running), so I wanted to try to get the web app running.
> >
> > I had Tomcat installed and working, so I deployed the nutch WAR file (I put it in <tomcat>/webapps), so I can see the nutch web app/search box when I got to http://myhost.com:8080/nutch-1.0.
> >
> > But, it's not showing anything other than the search box.
> >
> > I found this other tutorial:
> >
> > http://zillionics.com/resources/articles/NutchGuideForDummies.htm
> >
> > which says to modify the <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml to point to the nutch crawl directory.
> >
> > The crawl command line that I used was:
> >
> > bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
> >
> > so, I set the nutch-site.xml file to:
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> >   <property>
> >     <name>searcher.dir</name>
> >     <value>/opt/nutch-1.0/crawl.test</value>
> >
> >   </property>
> >
> > </configuration>
> >
> > I then bounced Tomcat, but I still only get the search box, with nothing else, when I access http://myhost.com:8080/nutch-1.0.
> >
> > Is there something else that I need to do to get the nutch web app to be able to see the data from the crawl?
> >
> > Thanks!
> >
> > Jim
>

Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

ohaya
Hi All,

I'm getting totally frustrated with this nutch web app :(.

I re-installed Nutch 1.0 completely.

I created the urls file in /opt/nutch-1.0

I added http.agent.name of "test1" and modified http.robots.agent in nutch-default.xml.

I modified /opt/nutch-1.0/conf/nutch-site.xml to:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>http.agent.name</name>
<value>test1</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your
organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
<property>
<name>http.agent.description</name>
<value></value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent
name.
</description>
</property>
<property>
<name>http.agent.url</name>
<value></value>
<description>A URL to advertise in the User-Agent header. This will
appear in parenthesis after the agent name. Custom dictates that this
should be a URL of a page explaining the purpose and behavior of this
crawler.
</description>
</property>
<property>
<name>http.agent.email</name>
<value></value>
<description>An email address to advertise in the HTTP 'From' request
header and User-Agent header. A good practice is to mangle this
address (e.g. 'info at example dot com') to avoid spamming.
</description>
</property>
</configuration>

I ran nutch crawl:

bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log

After that, I got in crawl3.test:

ls crawl3.test/
crawldb  index  indexes  linkdb  segments

I set <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>searcher.dir</name>
    <value>/opt/nutch-1.0/crawl3.test</value>

  </property>

</configuration>

I bounced Tomcat.

But, when I go to http://myhost.com:8080/nutch-1.0, I get only the search box, and when I search for anything, including "nutch" or "lucene", I get "0 of 0" :(...

Can anyone tell me what else I need to do to get the web app to show any search results?

Thanks,
Jim


---- [hidden email] wrote:

> Hi,
>
> I think that there must've been something messed up.
>
> I tried running a new crawl:
>
> bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log
>
> and I modified the nutch-site.xml file to point to crawl3.test directory.
>
> Then, after I bounce Tomcat, I can search successfully.
>
> However, I then wanted to do a crawl with more depth, so I did:
>
> bin/nutch crawl urls -dir crawl4.test -depth 3 >& crawl4.log
>
> When I ran this same crawl before, it took a REALLY long time, but this time, it finished pretty quickly (a couple of minutes), which I thought was strange.
>
> I modified nutch-site.xml and bounced Tomcat, and now, when I search, I only get 0 results again.
>
> So, it seems like, when I set depth to 3 on the crawl, something is not working?
>
> Also, I thought that if I access the nutch web app while the crawl was in progress, that I'm suppose to get some kind of status page?
>
> Instead, all I'm getting is the page with just the search box.
>
> So, I'm wondering:  Does the nutch crawl leave some data somewhere, even when I use a different name for the "dir" parameter?  The reason I'm asking this is that the original crawl I used had "-dir crawl.test", and was "-depth 3", and it seems like running a "-depth 3" crawl, even with a different "-dir" is no longer working correctly?
>
> Jim
>
>
>
>
>
> ---- [hidden email] wrote:
> > Hi,
> >
> > I noticed that in <tomcat>/webapps/nutch-1.0/WEB-INF/classes, there was a crawl-urlfilters.txt, which still had MY.DOMAIN in it, so I tried changing the parameter to "*apache.org" to match what was in the same file in /opt/nutch-1.0.
> >
> > But, even after that, and bouncing Tomcat, I get just the search box on the nutch webpage.
> >
> > I also checked catalina.out, and I saw these lines:
> >
> > INFO: Deploying web application archive nutch-1.0.war
> > 2009-07-14 08:23:27,687 INFO  NutchBean - creating new bean
> > 2009-07-14 08:23:28,273 INFO  SearchBean - opening indexes in /opt/nutch-1.0/crawl.test/indexes
> >
> > So, I checked in /opt/nutch-1.0, and there is no /opt/nutch-1.0/indexes file :(...
> >
> > I'm guessing that the nutch web app is not showing anything because that "indexes" file or directory is missing, but I guess the question is why is it missing?  
> >
> > As mentioned below, I am (still) running the intranet crawl, and by now, it's created 3 segments under /opt/nutch-1.0/crawl.test, but there is "indexes".
> >
> > What step am I missing?
> >
> > Jim
> >
> >
> >
> >
> > ---- [hidden email] wrote:
> > > Hi,
> > >
> > > I'm still following the Tutorial, per earlier post, and I think that I've gotten past the earlier errors with the intranet crawl (it's still running), so I wanted to try to get the web app running.
> > >
> > > I had Tomcat installed and working, so I deployed the nutch WAR file (I put it in <tomcat>/webapps), so I can see the nutch web app/search box when I got to http://myhost.com:8080/nutch-1.0.
> > >
> > > But, it's not showing anything other than the search box.
> > >
> > > I found this other tutorial:
> > >
> > > http://zillionics.com/resources/articles/NutchGuideForDummies.htm
> > >
> > > which says to modify the <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml to point to the nutch crawl directory.
> > >
> > > The crawl command line that I used was:
> > >
> > > bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
> > >
> > > so, I set the nutch-site.xml file to:
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <!-- Put site-specific property overrides in this file. -->
> > >
> > > <configuration>
> > >   <property>
> > >     <name>searcher.dir</name>
> > >     <value>/opt/nutch-1.0/crawl.test</value>
> > >
> > >   </property>
> > >
> > > </configuration>
> > >
> > > I then bounced Tomcat, but I still only get the search box, with nothing else, when I access http://myhost.com:8080/nutch-1.0.
> > >
> > > Is there something else that I need to do to get the nutch web app to be able to see the data from the crawl?
> > >
> > > Thanks!
> > >
> > > Jim
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

Doğacan Güney-3
On Tue, Jul 14, 2009 at 21:17, <[hidden email]> wrote:

> Hi All,
>
> I'm getting totally frustrated with this nutch web app :(.
>
> I re-installed Nutch 1.0 completely.
>
> I created the urls file in /opt/nutch-1.0
>
> I added http.agent.name of "test1" and modified http.robots.agent in nutch-default.xml.
>
> I modified /opt/nutch-1.0/conf/nutch-site.xml to:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
> <name>http.agent.name</name>
> <value>test1</value>
> <description>HTTP 'User-Agent' request header. MUST NOT be empty -
> please set this to a single word uniquely related to your
> organization.
> NOTE: You should also check other related properties:
> http.robots.agents
> http.agent.description
> http.agent.url
> http.agent.email
> http.agent.version
> and set their values appropriately.
> </description>
> </property>
> <property>
> <name>http.agent.description</name>
> <value></value>
> <description>Further description of our bot- this text is used in
> the User-Agent header. It appears in parenthesis after the agent
> name.
> </description>
> </property>
> <property>
> <name>http.agent.url</name>
> <value></value>
> <description>A URL to advertise in the User-Agent header. This will
> appear in parenthesis after the agent name. Custom dictates that this
> should be a URL of a page explaining the purpose and behavior of this
> crawler.
> </description>
> </property>
> <property>
> <name>http.agent.email</name>
> <value></value>
> <description>An email address to advertise in the HTTP 'From' request
> header and User-Agent header. A good practice is to mangle this
> address (e.g. 'info at example dot com') to avoid spamming.
> </description>
> </property>
> </configuration>
>
> I ran nutch crawl:
>
> bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log
>
> After that, I got in crawl3.test:
>
> ls crawl3.test/
> crawldb  index  indexes  linkdb  segments
>
> I set <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>  <property>
>    <name>searcher.dir</name>
>    <value>/opt/nutch-1.0/crawl3.test</value>
>
>  </property>
>
> </configuration>
>
> I bounced Tomcat.
>
> But, when I go to http://myhost.com:8080/nutch-1.0, I get only the search box, and when I search for anything, including "nutch" or "lucene", I get "0 of 0" :(...
>
> Can anyone tell me what else I need to do to get the web app to show any search results?
>

Check out your logs in both nutch and tomcat. Also see if there is
actually anything
in your index directory (you may try reading it with luke).

> Thanks,
> Jim
>
>
> ---- [hidden email] wrote:
>> Hi,
>>
>> I think that there must've been something messed up.
>>
>> I tried running a new crawl:
>>
>> bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log
>>
>> and I modified the nutch-site.xml file to point to crawl3.test directory.
>>
>> Then, after I bounce Tomcat, I can search successfully.
>>
>> However, I then wanted to do a crawl with more depth, so I did:
>>
>> bin/nutch crawl urls -dir crawl4.test -depth 3 >& crawl4.log
>>
>> When I ran this same crawl before, it took a REALLY long time, but this time, it finished pretty quickly (a couple of minutes), which I thought was strange.
>>
>> I modified nutch-site.xml and bounced Tomcat, and now, when I search, I only get 0 results again.
>>
>> So, it seems like, when I set depth to 3 on the crawl, something is not working?
>>
>> Also, I thought that if I access the nutch web app while the crawl was in progress, that I'm suppose to get some kind of status page?
>>
>> Instead, all I'm getting is the page with just the search box.
>>
>> So, I'm wondering:  Does the nutch crawl leave some data somewhere, even when I use a different name for the "dir" parameter?  The reason I'm asking this is that the original crawl I used had "-dir crawl.test", and was "-depth 3", and it seems like running a "-depth 3" crawl, even with a different "-dir" is no longer working correctly?
>>
>> Jim
>>
>>
>>
>>
>>
>> ---- [hidden email] wrote:
>> > Hi,
>> >
>> > I noticed that in <tomcat>/webapps/nutch-1.0/WEB-INF/classes, there was a crawl-urlfilters.txt, which still had MY.DOMAIN in it, so I tried changing the parameter to "*apache.org" to match what was in the same file in /opt/nutch-1.0.
>> >
>> > But, even after that, and bouncing Tomcat, I get just the search box on the nutch webpage.
>> >
>> > I also checked catalina.out, and I saw these lines:
>> >
>> > INFO: Deploying web application archive nutch-1.0.war
>> > 2009-07-14 08:23:27,687 INFO  NutchBean - creating new bean
>> > 2009-07-14 08:23:28,273 INFO  SearchBean - opening indexes in /opt/nutch-1.0/crawl.test/indexes
>> >
>> > So, I checked in /opt/nutch-1.0, and there is no /opt/nutch-1.0/indexes file :(...
>> >
>> > I'm guessing that the nutch web app is not showing anything because that "indexes" file or directory is missing, but I guess the question is why is it missing?
>> >
>> > As mentioned below, I am (still) running the intranet crawl, and by now, it's created 3 segments under /opt/nutch-1.0/crawl.test, but there is "indexes".
>> >
>> > What step am I missing?
>> >
>> > Jim
>> >
>> >
>> >
>> >
>> > ---- [hidden email] wrote:
>> > > Hi,
>> > >
>> > > I'm still following the Tutorial, per earlier post, and I think that I've gotten past the earlier errors with the intranet crawl (it's still running), so I wanted to try to get the web app running.
>> > >
>> > > I had Tomcat installed and working, so I deployed the nutch WAR file (I put it in <tomcat>/webapps), so I can see the nutch web app/search box when I got to http://myhost.com:8080/nutch-1.0.
>> > >
>> > > But, it's not showing anything other than the search box.
>> > >
>> > > I found this other tutorial:
>> > >
>> > > http://zillionics.com/resources/articles/NutchGuideForDummies.htm
>> > >
>> > > which says to modify the <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml to point to the nutch crawl directory.
>> > >
>> > > The crawl command line that I used was:
>> > >
>> > > bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
>> > >
>> > > so, I set the nutch-site.xml file to:
>> > >
>> > > <?xml version="1.0"?>
>> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> > >
>> > > <!-- Put site-specific property overrides in this file. -->
>> > >
>> > > <configuration>
>> > >   <property>
>> > >     <name>searcher.dir</name>
>> > >     <value>/opt/nutch-1.0/crawl.test</value>
>> > >
>> > >   </property>
>> > >
>> > > </configuration>
>> > >
>> > > I then bounced Tomcat, but I still only get the search box, with nothing else, when I access http://myhost.com:8080/nutch-1.0.
>> > >
>> > > Is there something else that I need to do to get the nutch web app to be able to see the data from the crawl?
>> > >
>> > > Thanks!
>> > >
>> > > Jim
>> >
>>
>
>



--
Doğacan Güney
Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

ohaya


---- "Doğacan Güney" <[hidden email]> wrote:

> On Tue, Jul 14, 2009 at 21:17, <[hidden email]> wrote:
> > Hi All,
> >
> > I'm getting totally frustrated with this nutch web app :(.
> >
> > I re-installed Nutch 1.0 completely.
> >
> > I created the urls file in /opt/nutch-1.0
> >
> > I added http.agent.name of "test1" and modified http.robots.agent in nutch-default.xml.
> >
> > I modified /opt/nutch-1.0/conf/nutch-site.xml to:
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> > <property>
> > <name>http.agent.name</name>
> > <value>test1</value>
> > <description>HTTP 'User-Agent' request header. MUST NOT be empty -
> > please set this to a single word uniquely related to your
> > organization.
> > NOTE: You should also check other related properties:
> > http.robots.agents
> > http.agent.description
> > http.agent.url
> > http.agent.email
> > http.agent.version
> > and set their values appropriately.
> > </description>
> > </property>
> > <property>
> > <name>http.agent.description</name>
> > <value></value>
> > <description>Further description of our bot- this text is used in
> > the User-Agent header. It appears in parenthesis after the agent
> > name.
> > </description>
> > </property>
> > <property>
> > <name>http.agent.url</name>
> > <value></value>
> > <description>A URL to advertise in the User-Agent header. This will
> > appear in parenthesis after the agent name. Custom dictates that this
> > should be a URL of a page explaining the purpose and behavior of this
> > crawler.
> > </description>
> > </property>
> > <property>
> > <name>http.agent.email</name>
> > <value></value>
> > <description>An email address to advertise in the HTTP 'From' request
> > header and User-Agent header. A good practice is to mangle this
> > address (e.g. 'info at example dot com') to avoid spamming.
> > </description>
> > </property>
> > </configuration>
> >
> > I ran nutch crawl:
> >
> > bin/nutch crawl urls -dir crawl3.test -depth 2 >& crawl3.log
> >
> > After that, I got in crawl3.test:
> >
> > ls crawl3.test/
> > crawldb  index  indexes  linkdb  segments
> >
> > I set <tomcat>/webapps/nutch-1.0/WEB-INF/classes/nutch-site.xml:
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> >  <property>
> >    <name>searcher.dir</name>
> >    <value>/opt/nutch-1.0/crawl3.test</value>
> >
> >  </property>
> >
> > </configuration>
> >
> > I bounced Tomcat.
> >
> > But, when I go to http://myhost.com:8080/nutch-1.0, I get only the search box, and when I search for anything, including "nutch" or "lucene", I get "0 of 0" :(...
> >
> > Can anyone tell me what else I need to do to get the web app to show any search results?
> >
>
> Check out your logs in both nutch and tomcat. Also see if there is
> actually anything
> in your index directory (you may try reading it with luke).

Hi Doğacan,

I have checked all of the nutch and Tomcat logs (catalina.out, etc.), and there was nothing there.

BUT, I think that I may have just gotten an idea about why this was not working.

It looks like when I run the nutch crawl, the "index" and "indexes" directories are not being created until the crawl is completely done.

[Is this normal nutch behavior???]

I was restarting my Tomcat after starting the nutch crawl, but at that time, the "index" and "indexes" directories weren't existent at the time that Tomcat was starting.

I just did a test, where I did a new crawl, and let the crawl completely finish, and THEN I started Tomcat (with pointing to the correct crawl directory), and it seemed that that work.  

I am in the process of doing another test crawl with more depth, and will post back after that.

Thanks!

Jim
Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

Alex McLintock
2009/7/14  <[hidden email]>:
> BUT, I think that I may have just gotten an idea about why this was not working.
>
> It looks like when I run the nutch crawl, the "index" and "indexes" directories are not being created until the crawl is completely done.
>
> [Is this normal nutch behavior???]


I believe that is correct. If youy look at the whole web version of
the commands (instead of jusrt "crawl" then you will see that you have
to explicitly execute the index command at the end. If you are using
"crawl" then I assume it does the same thing itself.

Alex
Reply | Threaded
Open this post in threaded view
|

Re: Tutorial followup - Nutch webapp not seeing stuff?

ohaya

---- Alex McLintock <[hidden email]> wrote:

> 2009/7/14  <[hidden email]>:
> > BUT, I think that I may have just gotten an idea about why this was not working.
> >
> > It looks like when I run the nutch crawl, the "index" and "indexes" directories are not being created until the crawl is completely done.
> >
> > [Is this normal nutch behavior???]
>
>
> I believe that is correct. If youy look at the whole web version of
> the commands (instead of jusrt "crawl" then you will see that you have
> to explicitly execute the index command at the end. If you are using
> "crawl" then I assume it does the same thing itself.
>
> Alex


Alex,

The reason that I asked if this was normal, is that when I was first researching for my nutch install, I thought that one of the tutorials I looked at said, at one point, go to the nutch URL and it'll show the status of the crawl.  I've been looking for that page that said that again, but haven't been able to find it, so maybe I was just imagining things :)...

Thanks,
Jim