database exchange of 2 nutches (hybridity of nutch with yacy)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

database exchange of 2 nutches (hybridity of nutch with yacy)

thomasasta
Hi

quite interesting projects out:
http://search.wikia.com/wiki/Search_Wikia

I want to suggest another one here.

Nutch is used for specified customers to index specified pages, or to have an open source engine for the worldwide web.

*Two* Nutch engines indexing the web make no sense.
It would be useful, if all Nutch - indexing the web - can be connected together and perform a database exchange.

Well you all know www.yacy.net - the p2p search engine - I do not want to suggest for nutch the same, but some interoperability of two nutch nodes.

Is it possible to add / import the indexed database of nutch A to nutch B ?

This import must be done manually, but why not within a network ?

If we have 5 nutch engines in the world indexing the web (I do not speak for customer solutions for partials intranet webs), why then not accumulating their indexes?

I want to suggest a structure, which is hybird with yacy.net

Would it be possible to peform a database-structure, which is usable as well for yacy?


Then the nutch index could be spread as well to yacy-nodes and get an backup there, other nutches then could add the yacy indexed media into their database.


So yacy p2p is the way to exchange and backup the database of several nutches, and the nutch can backup and exchange with yacy nodes and with other nutch engines.

I think therefore any nutch should run a yacy node as well and the database must be made interoperable.


Would this be possible?

Well, you know the emule-proejct.net filesharing structure. Or take gnutella with its ultrapeers. The emule servers support collecting urls/hashed and there is as well in emule a p2p node system called kademlia.


Would such a p2p engine structure be possible, if yacy is the p2p node and nutch the Ultrapeer indexing for its own, but as well backuping its database to the p2p yacy network and getting as well from the network redundant urls ?

See then the wiki-search project of the link above.

As urls get a human ranking (exactly the page is ranked after it was seen with the yacy bar) the nutch database could get as well these human ranked urls over the database exchange.

Any Idea, if a common database structure is possible and if nutch could implement a yacy node to held connections to the dht network of yacy, so nutch could be (as well) a yacy node? as both is java this should work?

Thanks for subscribing as well to the yacy.net forums to play around with this node and toolbar and the already implemented (need to be developed) human ranking.

Thanks for collaboration ideas.
tom

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Reply | Threaded
Open this post in threaded view
|

Re: [Search-l] database exchange of 2 nutches (hybridity of nutch with yacy)

Toufeeq Hussain
Hi,

On 1/2/07, [hidden email] <[hidden email]> wrote:

> *Two* Nutch engines indexing the web make no sense.
> It would be useful, if all Nutch - indexing the web - can be connected together and perform a database exchange.
>
> Well you all know www.yacy.net - the p2p search engine - I do not want to suggest for nutch the same, but some interoperability of two nutch nodes.

I guess Lucene-Hadoop has pretty much the same feature-set which you
are looking for.

http://wiki.apache.org/lucene-hadoop/
http://wiki.apache.org/nutch/NutchHadoopTutorial

-Toufeeq
--
blog @ http://toufeeq.net
Reply | Threaded
Open this post in threaded view
|

Re: database exchange of 2 nutches (hybridity of nutch with yacy)

Zaheed Haque
In reply to this post by thomasasta
Hi:

I am not sure p2p principles is good for web search.. where results
speed is number 1 concern. i.e. if your search engine is facing
consumers. However in a corporate environment i.e. various
corp.locations runs their own nutch installation and share index via a
common interface could use p2p principles then again just transferring
all the index to a single place is also compelling alternative.

In my view yes p2p ads flexibility but also adds "tons of complexity
in terms of operations" which I would prefer not to deal with :-)

However if there was a via-able business model where you could use
Nutch in conjunction with Amazon S3 and EC2 where an organization
offers the crawling service and those wishing to use parts or all of
the index would pay a small fee .. yes that would be nice.. I suppose
soon enough we will see Yahoo offering such service..

Cheers
Zaheed



On 1/2/07, [hidden email] <[hidden email]> wrote:

> Hi
>
> quite interesting projects out:
> http://search.wikia.com/wiki/Search_Wikia
>
> I want to suggest another one here.
>
> Nutch is used for specified customers to index specified pages, or to have an open source engine for the worldwide web.
>
> *Two* Nutch engines indexing the web make no sense.
> It would be useful, if all Nutch - indexing the web - can be connected together and perform a database exchange.
>
> Well you all know www.yacy.net - the p2p search engine - I do not want to suggest for nutch the same, but some interoperability of two nutch nodes.
>
> Is it possible to add / import the indexed database of nutch A to nutch B ?
>
> This import must be done manually, but why not within a network ?
>
> If we have 5 nutch engines in the world indexing the web (I do not speak for customer solutions for partials intranet webs), why then not accumulating their indexes?
>
> I want to suggest a structure, which is hybird with yacy.net
>
> Would it be possible to peform a database-structure, which is usable as well for yacy?
>
>
> Then the nutch index could be spread as well to yacy-nodes and get an backup there, other nutches then could add the yacy indexed media into their database.
>
>
> So yacy p2p is the way to exchange and backup the database of several nutches, and the nutch can backup and exchange with yacy nodes and with other nutch engines.
>
> I think therefore any nutch should run a yacy node as well and the database must be made interoperable.
>
>
> Would this be possible?
>
> Well, you know the emule-proejct.net filesharing structure. Or take gnutella with its ultrapeers. The emule servers support collecting urls/hashed and there is as well in emule a p2p node system called kademlia.
>
>
> Would such a p2p engine structure be possible, if yacy is the p2p node and nutch the Ultrapeer indexing for its own, but as well backuping its database to the p2p yacy network and getting as well from the network redundant urls ?
>
> See then the wiki-search project of the link above.
>
> As urls get a human ranking (exactly the page is ranked after it was seen with the yacy bar) the nutch database could get as well these human ranked urls over the database exchange.
>
> Any Idea, if a common database structure is possible and if nutch could implement a yacy node to held connections to the dht network of yacy, so nutch could be (as well) a yacy node? as both is java this should work?
>
> Thanks for subscribing as well to the yacy.net forums to play around with this node and toolbar and the already implemented (need to be developed) human ranking.
>
> Thanks for collaboration ideas.
> tom
>
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>
Reply | Threaded
Open this post in threaded view
|

Re: database exchange of 2 nutches (hybridity of nutch with yacy)

thomasasta
Hi

you mean computers work and do the indexing and users have to pay to get information? I believe that information should be free to every generation, not only in the horziontal generation.
Who has the right to crawl the websites of Afrika, is Yahoo financing the websiteowners as well?
Bread and games and information if free for the masses, yaesar said.
That´s why we should use p2p to have a strong solidarity of helping each other.

But that´s philosophy.

See yacy,net and try to install your test node, the results are in a few seconds there and it is a quite cool experience to see the peers returning results.

I guess a communication to your wife is not much more speedy...

;-)

-------- Original-Nachricht --------
Datum: Tue, 2 Jan 2007 10:44:28 +0100
Von: "Zaheed Haque" <[hidden email]>
An: [hidden email]
Betreff: Re: database exchange of 2 nutches (hybridity of nutch with yacy)

> Hi:
>
> I am not sure p2p principles is good for web search.. where results
> speed is number 1 concern. i.e. if your search engine is facing
> consumers. However in a corporate environment i.e. various
> corp.locations runs their own nutch installation and share index via a
> common interface could use p2p principles then again just transferring
> all the index to a single place is also compelling alternative.
>
> In my view yes p2p ads flexibility but also adds "tons of complexity
> in terms of operations" which I would prefer not to deal with :-)
>
> However if there was a via-able business model where you could use
> Nutch in conjunction with Amazon S3 and EC2 where an organization
> offers the crawling service and those wishing to use parts or all of
> the index would pay a small fee .. yes that would be nice.. I suppose
> soon enough we will see Yahoo offering such service..
>
> Cheers
> Zaheed
>
>
>
> On 1/2/07, [hidden email] <[hidden email]> wrote:
> > Hi
> >
> > quite interesting projects out:
> > http://search.wikia.com/wiki/Search_Wikia
> >
> > I want to suggest another one here.
> >
> > Nutch is used for specified customers to index specified pages, or to
> have an open source engine for the worldwide web.
> >
> > *Two* Nutch engines indexing the web make no sense.
> > It would be useful, if all Nutch - indexing the web - can be connected
> together and perform a database exchange.
> >
> > Well you all know www.yacy.net - the p2p search engine - I do not want
> to suggest for nutch the same, but some interoperability of two nutch nodes.
> >
> > Is it possible to add / import the indexed database of nutch A to nutch
> B ?
> >
> > This import must be done manually, but why not within a network ?
> >
> > If we have 5 nutch engines in the world indexing the web (I do not speak
> for customer solutions for partials intranet webs), why then not
> accumulating their indexes?
> >
> > I want to suggest a structure, which is hybird with yacy.net
> >
> > Would it be possible to peform a database-structure, which is usable as
> well for yacy?
> >
> >
> > Then the nutch index could be spread as well to yacy-nodes and get an
> backup there, other nutches then could add the yacy indexed media into their
> database.
> >
> >
> > So yacy p2p is the way to exchange and backup the database of several
> nutches, and the nutch can backup and exchange with yacy nodes and with other
> nutch engines.
> >
> > I think therefore any nutch should run a yacy node as well and the
> database must be made interoperable.
> >
> >
> > Would this be possible?
> >
> > Well, you know the emule-proejct.net filesharing structure. Or take
> gnutella with its ultrapeers. The emule servers support collecting urls/hashed
> and there is as well in emule a p2p node system called kademlia.
> >
> >
> > Would such a p2p engine structure be possible, if yacy is the p2p node
> and nutch the Ultrapeer indexing for its own, but as well backuping its
> database to the p2p yacy network and getting as well from the network redundant
> urls ?
> >
> > See then the wiki-search project of the link above.
> >
> > As urls get a human ranking (exactly the page is ranked after it was
> seen with the yacy bar) the nutch database could get as well these human
> ranked urls over the database exchange.
> >
> > Any Idea, if a common database structure is possible and if nutch could
> implement a yacy node to held connections to the dht network of yacy, so
> nutch could be (as well) a yacy node? as both is java this should work?
> >
> > Thanks for subscribing as well to the yacy.net forums to play around
> with this node and toolbar and the already implemented (need to be developed)
> human ranking.
> >
> > Thanks for collaboration ideas.
> > tom
> >
> > --
> > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> > Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
> >

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer