Google Search on Nutch?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Google Search on Nutch?

Justin Hartman
I'm sorry but I have to ask this question - stupid as it may seem....

Why does the Nutch home page [1] have Google Search integrated into
the site when surely it should be using Nutch? What better a
demonstration of the Nutch system than the Nutch home page?
--
Regards
Justin Hartman
PGP Key ID: 102CC123

[1] http://lucene.apache.org/nutch/
Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Lukáš Vlček
Hi,

I am not actively involved in this project so my answer may not be correct
but I would say that having Google integrated search in nutch web is much
easier way to go (and cheaper as well). You don't need to have disk
resources to store nutch index, you don't need to have dedicated admin which
checks/maintains crawler ... etc.

You can search the web/maillists for applications powered by Nutch if you
need references.

Lukas

On 1/3/07, Justin Hartman <[hidden email]> wrote:

>
> I'm sorry but I have to ask this question - stupid as it may seem....
>
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch? What better a
> demonstration of the Nutch system than the Nutch home page?
> --
> Regards
> Justin Hartman
> PGP Key ID: 102CC123
>
> [1] http://lucene.apache.org/nutch/
>
Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

kettle
In reply to this post by Justin Hartman
Asked the same thing myself some time ago, but never got a response.
Thought it was a half-decent question though.

On 1/3/07, Justin Hartman <[hidden email]> wrote:

>
> I'm sorry but I have to ask this question - stupid as it may seem....
>
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch? What better a
> demonstration of the Nutch system than the Nutch home page?
> --
> Regards
> Justin Hartman
> PGP Key ID: 102CC123
>
> [1] http://lucene.apache.org/nutch/
>
Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Andrzej Białecki-2
Josef Novak wrote:
> Asked the same thing myself some time ago, but never got a response.
> Thought it was a half-decent question though.

Yes, it's a valid question :) The truth is that there are too few
developer resources familiar enough with Apache infrastructure in order
to set it up. Nutch is more than capable of doing this, all it takes is
one person familiar with the infrastructure & the nightly build process,
and with a day or two to spare ...

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Jim R. Wilson
Andrezej Bialecki wrote:
> ... familiar enough with Apache infrastructure in order to set it up

Could you expand a little on the experience needed?  Is this in regards to
tying Apache to Tomcat?  (I remember ages ago this used to be done with
something called "mod_jk" but who knows anymore).

Maybe someone already has a Nutch server pointed to the Nutch website and
restricted to that domain?

-- Jim

On 1/3/07, Andrzej Bialecki <[hidden email]> wrote:

>
> Josef Novak wrote:
> > Asked the same thing myself some time ago, but never got a response.
> > Thought it was a half-decent question though.
>
> Yes, it's a valid question :) The truth is that there are too few
> developer resources familiar enough with Apache infrastructure in order
> to set it up. Nutch is more than capable of doing this, all it takes is
> one person familiar with the infrastructure & the nightly build process,
> and with a day or two to spare ...
>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Andrzej Białecki-2
Jim Wilson wrote:
> Andrezej Bialecki wrote:
>> ... familiar enough with Apache infrastructure in order to set it up
>
> Could you expand a little on the experience needed?  Is this in
> regards to
> tying Apache to Tomcat?  (I remember ages ago this used to be done with
> something called "mod_jk" but who knows anymore).

No, I meant the apache.org as a person (a committer), who is familiar
enough with both Nutch and the local infrastructure at apache.org so
that he could set it up.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

rhodebump
Not to be a wet towel, but never heard or saw apache running a java
process on any of their servers.  Feel free to prove me wrong!

My idea to make this happen would be that some of us could "donate" the
use of a server/bandwidth to run nutch.

What would  be "super ultra cool" is that a few of us donate a dedicated
JVM to run the nutch indexer/webapp and we could round robin between
them.  I will be the first to volunteer my resources for this purpose.  
I have a co-location, and a commercial SDSL connection with 2 racks of
servers...



Phillip

Andrzej Bialecki wrote:

> Jim Wilson wrote:
>
>> Andrezej Bialecki wrote:
>>
>>> ... familiar enough with Apache infrastructure in order to set it up
>>
>>
>> Could you expand a little on the experience needed?  Is this in
>> regards to
>> tying Apache to Tomcat?  (I remember ages ago this used to be done with
>> something called "mod_jk" but who knows anymore).
>
>
> No, I meant the apache.org as a person (a committer), who is familiar
> enough with both Nutch and the local infrastructure at apache.org so
> that he could set it up.
>


Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Nitin Borwankar
Phillip Rhodes wrote:

> Not to be a wet towel, but never heard or saw apache running a java
> process on any of their servers.  Feel free to prove me wrong!
>
> My idea to make this happen would be that some of us could "donate"
> the use of a server/bandwidth to run nutch.
> What would  be "super ultra cool" is that a few of us donate a
> dedicated JVM to run the nutch indexer/webapp and we could round robin
> between them.  I will be the first to volunteer my resources for this
> purpose.  I have a co-location, and a commercial SDSL connection with
> 2 racks of servers...


Glad to do the same - I have an instance on Amazon EC2.  I have already
experimented with single site indexing for some friends.

Nitin



--
Nitin Borwankar
Find, Learn, Act ....
Greener, the search engine for the planet
http://greener.com
[hidden email]
510-872-7066

Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Zaheed Haque
Follow the whole thread..

http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200601.mbox/%3c43D7EC8A.9050506@...%3e

This is already in progress as far as I know.. It is just not completed yet ..

Cheers

On 1/4/07, Nitin Borwankar <[hidden email]> wrote:

> Phillip Rhodes wrote:
>
> > Not to be a wet towel, but never heard or saw apache running a java
> > process on any of their servers.  Feel free to prove me wrong!
> >
> > My idea to make this happen would be that some of us could "donate"
> > the use of a server/bandwidth to run nutch.
> > What would  be "super ultra cool" is that a few of us donate a
> > dedicated JVM to run the nutch indexer/webapp and we could round robin
> > between them.  I will be the first to volunteer my resources for this
> > purpose.  I have a co-location, and a commercial SDSL connection with
> > 2 racks of servers...
>
>
> Glad to do the same - I have an instance on Amazon EC2.  I have already
> experimented with single site indexing for some friends.
>
> Nitin
>
>
>
> --
> Nitin Borwankar
> Find, Learn, Act ....
> Greener, the search engine for the planet
> http://greener.com
> [hidden email]
> 510-872-7066
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Google Search on Nutch?

Sean Dean-3
In reply to this post by Justin Hartman
I don't have anything to back up my opinion, but I think the project in that specific thread was either halted or delayed indefinitely. Its been almost one year since that discussion started, and there wasn't much followup unless it was all private.
 
If there is really a need for this to happen, and it looked like there was at the time then you might want to ask a Nutch developer, and if they don't know I'm sure they can forward your request directly to Doug.

 
----- Original Message ----
From: Zaheed Haque <[hidden email]>
To: [hidden email]
Sent: Thursday, January 4, 2007 4:06:06 AM
Subject: Re: Google Search on Nutch?


Follow the whole thread..

http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200601.mbox/%3c43D7EC8A.9050506@...%3e

This is already in progress as far as I know.. It is just not completed yet ..

Cheers

On 1/4/07, Nitin Borwankar <[hidden email]> wrote:

> Phillip Rhodes wrote:
>
> > Not to be a wet towel, but never heard or saw apache running a java
> > process on any of their servers.  Feel free to prove me wrong!
> >
> > My idea to make this happen would be that some of us could "donate"
> > the use of a server/bandwidth to run nutch.
> > What would  be "super ultra cool" is that a few of us donate a
> > dedicated JVM to run the nutch indexer/webapp and we could round robin
> > between them.  I will be the first to volunteer my resources for this
> > purpose.  I have a co-location, and a commercial SDSL connection with
> > 2 racks of servers...
>
>
> Glad to do the same - I have an instance on Amazon EC2.  I have already
> experimented with single site indexing for some friends.
>
> Nitin
>
>
>
> --
> Nitin Borwankar
> Find, Learn, Act ....
> Greener, the search engine for the planet
> http://greener.com
> [hidden email]
> 510-872-7066
>
>
Reply | Threaded
Open this post in threaded view
|

Nutch zone (was Re: Google Search on Nutch?)

Thorsten Scherler-3
In reply to this post by Justin Hartman
On Wed, 2007-01-03 at 14:39 +0200, Justin Hartman wrote:
> I'm sorry but I have to ask this question - stupid as it may seem....
>
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch?

See the source code of the page:
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.7">
<meta name="Forrest-skin-name" content="pelt">

Forrest has not yet a explicit nutch search interface,
however forrest support searching against a lucene index out of the box.

Further I just added a solr plugin to forrest, trying to say forrest
could be easily extended to use nutch but ...

> What better a
> demonstration of the Nutch system than the Nutch home page?

Well, on people.apache.org, where all websites of http://apache.org/ are
hosted there is no java allowed ASAIR.

However nutch could use their zone server to add a demonstration for the
community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
http://forrest.zones.apache.org/, but I guess that is a dev topic.

HTH

salu2

--
thorsten

"Together we stand, divided we fall!"
Hey you (Pink Floyd)

Reply | Threaded
Open this post in threaded view
|

Re: Nutch zone (was Re: Google Search on Nutch?)

Jim R. Wilson
> However nutch could use their zone server to add a demonstration for the
> community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
> http://forrest.zones.apache.org/, but I guess that is a dev topic.

+1

Is there a reason not to use the zone server in this manner?

-- Jim

On 1/11/07, Thorsten Scherler <[hidden email]>
wrote:

>
> On Wed, 2007-01-03 at 14:39 +0200, Justin Hartman wrote:
> > I'm sorry but I have to ask this question - stupid as it may seem....
> >
> > Why does the Nutch home page [1] have Google Search integrated into
> > the site when surely it should be using Nutch?
>
> See the source code of the page:
> <meta content="Apache Forrest" name="Generator">
> <meta name="Forrest-version" content="0.7">
> <meta name="Forrest-skin-name" content="pelt">
>
> Forrest has not yet a explicit nutch search interface,
> however forrest support searching against a lucene index out of the box.
>
> Further I just added a solr plugin to forrest, trying to say forrest
> could be easily extended to use nutch but ...
>
> > What better a
> > demonstration of the Nutch system than the Nutch home page?
>
> Well, on people.apache.org, where all websites of http://apache.org/ are
> hosted there is no java allowed ASAIR.
>
> However nutch could use their zone server to add a demonstration for the
> community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
> http://forrest.zones.apache.org/, but I guess that is a dev topic.
>
> HTH
>
> salu2
>
> --
> thorsten
>
> "Together we stand, divided we fall!"
> Hey you (Pink Floyd)
>
>