Launch nutch from the web-application

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Launch nutch from the web-application

Berlin Brown
I know this probably goes back to more java and threaded coding.  But,
has anyone tried launching a nutch crawl from Tomcat based from a
web-application.  And do they have any code for doing that.  I guess
one could launch a thread within the same JVM as Tomcat's and then
invoke one of the 'nutch' script applications.  Anyone had any issues
with this?

--
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies
Reply | Threaded
Open this post in threaded view
|

Re: Launch nutch from the web-application

waterwheel
We've got a php front end for version 0.71 that starts and stops
crawling/indexing/fetching.  One button starts the entire process -
updatedb/create fetchlist/fetch/index - over and over.  A second button
stops the process, unless a crawl is in progress at which point it stops
after the current crawl is complete.

If that's of any use to you let me know and I'll see if I can cut a copy
of the php code.

g.


Berlin Brown wrote:

> I know this probably goes back to more java and threaded coding.  But,
> has anyone tried launching a nutch crawl from Tomcat based from a
> web-application.  And do they have any code for doing that.  I guess
> one could launch a thread within the same JVM as Tomcat's and then
> invoke one of the 'nutch' script applications.  Anyone had any issues
> with this?
>
Reply | Threaded
Open this post in threaded view
|

Re: Launch nutch from the web-application

lpitcher@redomains.com
I would appreciate a copy of the code as I am using version 0.71

Thanks,
Lawrence



On 5/12/06, Insurance Squared Inc. <[hidden email]> wrote:

>
> We've got a php front end for version 0.71 that starts and stops
> crawling/indexing/fetching.  One button starts the entire process -
> updatedb/create fetchlist/fetch/index - over and over.  A second button
> stops the process, unless a crawl is in progress at which point it stops
> after the current crawl is complete.
>
> If that's of any use to you let me know and I'll see if I can cut a copy
> of the php code.
>
> g.
>
>
> Berlin Brown wrote:
>
> > I know this probably goes back to more java and threaded coding.  But,
> > has anyone tried launching a nutch crawl from Tomcat based from a
> > web-application.  And do they have any code for doing that.  I guess
> > one could launch a thread within the same JVM as Tomcat's and then
> > invoke one of the 'nutch' script applications.  Anyone had any issues
> > with this?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Launch nutch from the web-application

Yuzo Kanomata
In reply to this post by Berlin Brown
We are supporting a topic specific portal where we have some privileged
users that seed our crawls. These users present urls through a webpage
accessibly from their account (they also specify the depth of the crawls).

BTW we are using Apache to present this part of the portal to the user, so
what the crawl submit button does is call bash scripts to execute the
actions. (be mindful that you may want to adjust your
conf/crawl-urlfilter.txt to include the domains selected from the crawl)
The script runs almost like the way the Tutorial presents the web crawl
<http://lucene.apache.org/nutch/tutorial.html#Whole-web%3A+Concepts>.

The way we have things set up, we restart Tomcat after the crawl occurs
because the newly indexed materials will not show up otherwise. (We have
each privileged user's webdbs, crawls and segments recorded so that we can
back out of a bad crawl or prune fairly well -- we still have a dedup
problem) We have slightly modified jsp that came with Nutch to present the
web-based search page.

Now crawl is a small part of what we are doing, so there may be some better
answers. We were in sort of a hurry and this worked out fairly well so far.

Yuzo

--On Friday, May 12, 2006 7:52 PM -0400 Berlin Brown
<[hidden email]> wrote:

> I know this probably goes back to more java and threaded coding.  But,
> has anyone tried launching a nutch crawl from Tomcat based from a
> web-application.  And do they have any code for doing that.  I guess
> one could launch a thread within the same JVM as Tomcat's and then
> invoke one of the 'nutch' script applications.  Anyone had any issues
> with this?
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies