Distributed installation

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Distributed installation

Chetan Sahasrabudhe
Hello,
 
    I am planning to put distributed nutch installation.
Can anyone point to document or link to understand the steps

Regards
Chetan
_______________________________

Tel +91-20-5652 5000 ext 2513

KPITCummins Infosystems Limited
Hinjwadi
Pune INDIA
_______________________________

 


---------------------------------
This message contains the information that may be privileged and is  the property of the KPIT Cummins Infosystems LTD.It is intended only for the person to whom it is addressed. If you are not intended recipient, you are not authorized to read, print , retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins does not accept any liability for virus infected mails.
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Stefan Groschupf-2
try:
http://wiki.media-style.com/display/nutchDocu/ 
setup+multiple+search+sever

Am 17.05.2005 um 10:03 schrieb Chetan Sahasrabudhe:

> Hello,
>
>     I am planning to put distributed nutch installation.
> Can anyone point to document or link to understand the steps
>
> Regards
> Chetan
> _______________________________
>
> Tel +91-20-5652 5000 ext 2513
>
> KPITCummins Infosystems Limited
> Hinjwadi
> Pune INDIA
> _______________________________
>
>
>
>
> ---------------------------------
> This message contains the information that may be privileged and is  
> the property of the KPIT Cummins Infosystems LTD.It is intended only  
> for the person to whom it is addressed. If you are not intended  
> recipient, you are not authorized to read, print , retain copy,  
> disseminate, distribute, or use this message or any part thereof. If  
> you receive this message in error, please notify the sender  
> immediately and delete all copies of this message. KPIT Cummins does  
> not accept any liability for virus infected mails.
>
-----------information technology-------------------
company:     http://www.media-style.com
forum:           http://www.text-mining.org
blog:             http://www.find23.net

Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Giovanni Novelli
There is something wrong with such link
(http://wiki.media-style.com/display/nutchDocu/setup+multiple+search+sever):

Internal Server Error
The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, [hidden email] and
inform them of the time the error occurred, and anything you might
have done that may have caused the error.

More information about this error may be available in the server error log.

Apache/1.3.26 Server at wiki.media-style.com Port 80

On 5/18/05, Stefan Groschupf <[hidden email]> wrote:

> try:
> http://wiki.media-style.com/display/nutchDocu/
> setup+multiple+search+sever
>
> Am 17.05.2005 um 10:03 schrieb Chetan Sahasrabudhe:
>
> > Hello,
> >
> >     I am planning to put distributed nutch installation.
> > Can anyone point to document or link to understand the steps
> >
> > Regards
> > Chetan
> > _______________________________
> >
> > Tel +91-20-5652 5000 ext 2513
> >
> > KPITCummins Infosystems Limited
> > Hinjwadi
> > Pune INDIA
> > _______________________________
> >
> >
> >
> >
> > ---------------------------------
> > This message contains the information that may be privileged and is
> > the property of the KPIT Cummins Infosystems LTD.It is intended only
> > for the person to whom it is addressed. If you are not intended
> > recipient, you are not authorized to read, print , retain copy,
> > disseminate, distribute, or use this message or any part thereof. If
> > you receive this message in error, please notify the sender
> > immediately and delete all copies of this message. KPIT Cummins does
> > not accept any liability for virus infected mails.
> >
> -----------information technology-------------------
> company:     http://www.media-style.com
> forum:           http://www.text-mining.org
> blog:                http://www.find23.net
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

luti
Dear Users!

Firstly sorry my bad English.
I  read Stephans great documentation at
http://wiki.media-style.com/display/nutchDocu/.
I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with 3
backend with 12 million pages ( 4million / backend AMD64 4 GByte RAM 64
bit linux with jdk 1.5_03).

When I start using it with 3-5 queries / sec, after 1-2 minute the
frontend does'nt answer to the requests.
In the Tomcat manager / status I see there is many thread busy (150 and
it increasses, now 241), and these are with Stage 'S' (Service).

The backend with usage: top 40-60 % CPU.
The frontend with usage: 5% CPU.

Have you any idea what is the problem?

Best Regards,
    Ferenc


Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

luti
If I press F5 (refresh) key for 1 minute in my IE, the  situation is the
sam with previous mail. The backends CPU usage are 95% for long time.
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Stefan Groschupf-2
In reply to this post by luti
I notice similar behaviors.
I guess the backend servers does not answering fast enough.
I was thinking about to have multiple search server groups that have  
identical content and then query groups in a round robbing style.
What people  think about this idea?

It is already easy to setup multiple tomcat that use different search  
servers and simply split traffic by adding 2 or n ip to your dns for  
the same domain.


Stefan

Am 18.05.2005 um 16:59 schrieb [hidden email]:

> Dear Users!
>
> Firstly sorry my bad English.
> I  read Stephans great documentation at http://wiki.media-style.com/ 
> display/nutchDocu/.
> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with  
> 3 backend with 12 million pages ( 4million / backend AMD64 4 GByte  
> RAM 64 bit linux with jdk 1.5_03).
>
> When I start using it with 3-5 queries / sec, after 1-2 minute the  
> frontend does'nt answer to the requests.
> In the Tomcat manager / status I see there is many thread busy (150  
> and it increasses, now 241), and these are with Stage 'S' (Service).
>
> The backend with usage: top 40-60 % CPU.
> The frontend with usage: 5% CPU.
>
> Have you any idea what is the problem?
>
> Best Regards,
>    Ferenc
>
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply | Threaded
Open this post in threaded view
|

RE: Distributed installation

Chetan Sahasrabudhe
In reply to this post by Chetan Sahasrabudhe
The search is pointing to the same page that I refered to setup the distributed installation.

The installation and backbone run is working great.
The problem is when I try to perform search through my tomcat interface,
I end up with following error on my remote installation.

050519 113934 19 Server handler on 9000: getSummary(20050310044452/13, test)
050519 113934 12 Server handler on 9000: getSummary(20050310044452/191, test)
050519 113934 21 Server handler on 9000: getSummary(20050310044452/384, test)
050519 113934 18 Server handler on 9000: getSummary(20050310044452/1b8, test)
050519 113934 17 Server handler on 9000: getSummary(20050310044452/2e5, test)
050519 113935 14 found resource common-terms.utf8 at file:/E:/nutch-0.6/conf/common-terms.utf8
050519 113935 22 Server connection on port 9000 from 10.10.16.110 caught: java.lang.RuntimeException: U
nknown op code: 10
java.lang.RuntimeException: Unknown op code: 10
        at net.nutch.searcher.DistributedSearch$Param.readFields(DistributedSearch.java:110)
        at net.nutch.ipc.Server$Connection.run(Server.java:120)
050519 113935 22 Server connection on port 9000 from 10.10.16.110: exiting

what might be the reason ?

Regards
Chetan


-----Original Message-----
From: Stefan Groschupf [mailto:[hidden email]]
Sent: Wednesday, May 18, 2005 4:08 PM
To: [hidden email]
Subject: Re: Distributed installation


try:
http://wiki.media-style.com/display/nutchDocu/ 
setup+multiple+search+sever

Am 17.05.2005 um 10:03 schrieb Chetan Sahasrabudhe:

> Hello,
>
>     I am planning to put distributed nutch installation.
> Can anyone point to document or link to understand the steps
>
> Regards
> Chetan
> _______________________________
>
> Tel +91-20-5652 5000 ext 2513
>
> KPITCummins Infosystems Limited
> Hinjwadi
> Pune INDIA
> _______________________________
>
>
>
>
> ---------------------------------
> This message contains the information that may be privileged and is  
> the property of the KPIT Cummins Infosystems LTD.It is intended only  
> for the person to whom it is addressed. If you are not intended  
> recipient, you are not authorized to read, print , retain copy,  
> disseminate, distribute, or use this message or any part thereof. If  
> you receive this message in error, please notify the sender  
> immediately and delete all copies of this message. KPIT Cummins does  
> not accept any liability for virus infected mails.
>
-----------information technology-------------------
company:     http://www.media-style.com
forum:           http://www.text-mining.org
blog:             http://www.find23.net



---------------------------------
This message contains the information that may be privileged and is  the property of the KPIT Cummins Infosystems LTD.It is intended only for the person to whom it is addressed. If you are not intended recipient, you are not authorized to read, print , retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins does not accept any liability for virus infected mails.
Reply | Threaded
Open this post in threaded view
|

Deleting a site from the nutch db/segments

quovadis
Hi there

Is there any way to delete a specific domain which was
crawled from the database/segments?

Thanks
quo
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote
Reply | Threaded
Open this post in threaded view
|

Re: Deleting a site from the nutch db/segments

luti
 From the segments:
bin/nutch prune segments -queries qry.txt -output deleted.txt
where
qry.txt contains:
url:domain?com

deleted.txt contains deleted url list.

[hidden email] wrotte:

>Hi there
>
>Is there any way to delete a specific domain which was
>crawled from the database/segments?
>
>Thanks
>quo
>_____________________________________________________________________
>For super low premiums, click here http://www.dialdirect.co.za/quote
>
>
>  
>