Re: Distributed installation

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Stefan Groschupf-2
I notice similar behaviors.
I guess the backend servers does not answering fast enough.
I was thinking about to have multiple search server groups that have  
identical content and then query groups in a round robbing style.
What people  think about this idea?

It is already easy to setup multiple tomcat that use different search  
servers and simply split traffic by adding 2 or n ip to your dns for  
the same domain.


Stefan

Am 18.05.2005 um 16:59 schrieb [hidden email]:

> Dear Users!
>
> Firstly sorry my bad English.
> I  read Stephans great documentation at http://wiki.media-style.com/ 
> display/nutchDocu/.
> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with  
> 3 backend with 12 million pages ( 4million / backend AMD64 4 GByte  
> RAM 64 bit linux with jdk 1.5_03).
>
> When I start using it with 3-5 queries / sec, after 1-2 minute the  
> frontend does'nt answer to the requests.
> In the Tomcat manager / status I see there is many thread busy (150  
> and it increasses, now 241), and these are with Stage 'S' (Service).
>
> The backend with usage: top 40-60 % CPU.
> The frontend with usage: 5% CPU.
>
> Have you any idea what is the problem?
>
> Best Regards,
>    Ferenc
>
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

luti
Dear Stephan,

Thanks for you fast answer.
I think there are some general 'security' hole in nutch. E.g.: if I make
queries with hitsPerPage=10000, or if the user press F5 key in IE for
long time.

In my situation the problem is 'paginating' like google (pages: 1-10).
If the isTotalIsExact() results false -> research with hitsPerPage * 10.
I think I will set maxHitsPerSite value to 0 for a week, and I will try
to reanalize how to reprograming the 'paginating'.

Thanks, Ferenc

Stefan Groschupf wrotte:

> I notice similar behaviors.
> I guess the backend servers does not answering fast enough.
> I was thinking about to have multiple search server groups that have  
> identical content and then query groups in a round robbing style.
> What people  think about this idea?
>
> It is already easy to setup multiple tomcat that use different search  
> servers and simply split traffic by adding 2 or n ip to your dns for  
> the same domain.
>
>
> Stefan
>
> Am 18.05.2005 um 16:59 schrieb [hidden email]:
>
>> Dear Users!
>>
>> Firstly sorry my bad English.
>> I  read Stephans great documentation at http://wiki.media-style.com/ 
>> display/nutchDocu/.
>> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with  3
>> backend with 12 million pages ( 4million / backend AMD64 4 GByte  RAM
>> 64 bit linux with jdk 1.5_03).
>>
>> When I start using it with 3-5 queries / sec, after 1-2 minute the  
>> frontend does'nt answer to the requests.
>> In the Tomcat manager / status I see there is many thread busy (150  
>> and it increasses, now 241), and these are with Stage 'S' (Service).
>>
>> The backend with usage: top 40-60 % CPU.
>> The frontend with usage: 5% CPU.
>>
>> Have you any idea what is the problem?
>>
>> Best Regards,
>>    Ferenc
>>
>>
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Distributed installation

Stefan Groschupf-2
Ferenc,
you can fix this easily. Just hardcode the hitsPerPage in the jsp and  
count the queries per ip to limit them.
I notice google does not answer queries if the http header is not  
correct and the agent identification must be correct.
Stefan
Am 19.05.2005 um 08:58 schrieb [hidden email]:

> Dear Stephan,
>
> Thanks for you fast answer.
> I think there are some general 'security' hole in nutch. E.g.: if I  
> make queries with hitsPerPage=10000, or if the user press F5 key in  
> IE for long time.
>
> In my situation the problem is 'paginating' like google (pages:  
> 1-10). If the isTotalIsExact() results false -> research with  
> hitsPerPage * 10.
> I think I will set maxHitsPerSite value to 0 for a week, and I will  
> try to reanalize how to reprograming the 'paginating'.
>
> Thanks, Ferenc
>
> Stefan Groschupf wrotte:
>
>
>> I notice similar behaviors.
>> I guess the backend servers does not answering fast enough.
>> I was thinking about to have multiple search server groups that  
>> have  identical content and then query groups in a round robbing  
>> style.
>> What people  think about this idea?
>>
>> It is already easy to setup multiple tomcat that use different  
>> search  servers and simply split traffic by adding 2 or n ip to  
>> your dns for  the same domain.
>>
>>
>> Stefan
>>
>> Am 18.05.2005 um 16:59 schrieb [hidden email]:
>>
>>
>>> Dear Users!
>>>
>>> Firstly sorry my bad English.
>>> I  read Stephans great documentation at http://wiki.media- 
>>> style.com/ display/nutchDocu/.
>>> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08)  
>>> with  3 backend with 12 million pages ( 4million / backend AMD64  
>>> 4 GByte  RAM 64 bit linux with jdk 1.5_03).
>>>
>>> When I start using it with 3-5 queries / sec, after 1-2 minute  
>>> the  frontend does'nt answer to the requests.
>>> In the Tomcat manager / status I see there is many thread busy  
>>> (150  and it increasses, now 241), and these are with Stage  
>>> 'S' (Service).
>>>
>>> The backend with usage: top 40-60 % CPU.
>>> The frontend with usage: 5% CPU.
>>>
>>> Have you any idea what is the problem?
>>>
>>> Best Regards,
>>>    Ferenc
>>>
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------
>> company:        http://www.media-style.com
>> forum:        http://www.text-mining.org
>> blog:            http://www.find23.net
>>
>>
>>
>>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> _______________________________________________
> Nutch-developers mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Piotr Kosiorowski
In reply to this post by Stefan Groschupf-2
Hello Stefan,

I have already written a component that implements this round robin
searching functionality some time ago - but right now it is not working
correctly with latest nutch SVN code - anyway I have plans to update it.
It was done inside modified NutchBean - it was selecting the group of
servers to be used for particular request in round-robin fashion and in
case of failure it was moving the current server group to inactive pool
and retrying using another group.
There was a separate recovery thread that was checking from time to time
if inactive pool contains some groups and was trying to recover it.
So in addition to load-balancing all requests among cluster of search
server groups it was also providing fault tolerance (automatic detection
of inactive nodes with recovery).
Our plan was to use two or more tomcat servers with NutchBean each
configured to use all search server groups. This will remove single
point of failure during search.

So if there is enough interest for it I can downgrade it to JDK 1.4 (I
am using java.util.concurrent) and send it as a patch.

Regards,
Piotr



Stefan Groschupf wrote:

> I notice similar behaviors.
> I guess the backend servers does not answering fast enough.
> I was thinking about to have multiple search server groups that have  
> identical content and then query groups in a round robbing style.
> What people  think about this idea?
>
> It is already easy to setup multiple tomcat that use different search  
> servers and simply split traffic by adding 2 or n ip to your dns for  
> the same domain.
>
>
> Stefan
>
> Am 18.05.2005 um 16:59 schrieb [hidden email]:
>
>> Dear Users!
>>
>> Firstly sorry my bad English.
>> I  read Stephans great documentation at http://wiki.media-style.com/ 
>> display/nutchDocu/.
>> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08) with  3
>> backend with 12 million pages ( 4million / backend AMD64 4 GByte  RAM
>> 64 bit linux with jdk 1.5_03).
>>
>> When I start using it with 3-5 queries / sec, after 1-2 minute the  
>> frontend does'nt answer to the requests.
>> In the Tomcat manager / status I see there is many thread busy (150  
>> and it increasses, now 241), and these are with Stage 'S' (Service).
>>
>> The backend with usage: top 40-60 % CPU.
>> The frontend with usage: 5% CPU.
>>
>> Have you any idea what is the problem?
>>
>> Best Regards,
>>    Ferenc
>>
>>
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Stefan Groschupf-2
> So if there is enough interest for it I can downgrade it to JDK 1.4  
> (I am using java.util.concurrent) and send it as a patch.

Sounds very great, at least I have a lot of interest!!!
:-)

Thanks,
Stefan
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Andrzej Białecki-2
In reply to this post by Piotr Kosiorowski
Piotr Kosiorowski wrote:
> Hello Stefan,
>
> I have already written a component that implements this round robin
> searching functionality some time ago - but right now it is not working
[..]

>
> So if there is enough interest for it I can downgrade it to JDK 1.4 (I
> am using java.util.concurrent) and send it as a patch.

That would be very nice! concurrent-1.3.4.jar is already a part of the
Nutch distribution - I don't remember how different it is from JDK5, but
this should ease the porting effort...



--
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: Re: Distributed installation

luti
In reply to this post by Stefan Groschupf-2
I also intersting to your patch.

Thanks for it:
    Ferenc

 >So if there is enough interest for it I can downgrade it to JDK 1.4  
(I am using java.util.concurrent) and send it as a patch.

>
> Sounds very great, at least I have a lot of interest!!!
> :-)
>
> Thanks,
> Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Re: Distributed installation

luti
In reply to this post by Stefan Groschupf-2
Stephan,
Thanks, for suggestions.
I  fixed, the 'paginating' in my source. Not query all the "pages"
(1-10), only the first. If  user click to the page that not have hits, I
forward back to the last page.
I use servlets with velocity, not the jsp. I completty rewrite the jsp
pages, make some optimizations etc.

Stefan Groschupf wrotte:

> Ferenc,
> you can fix this easily. Just hardcode the hitsPerPage in the jsp and  
> count the queries per ip to limit them.
> I notice google does not answer queries if the http header is not  
> correct and the agent identification must be correct.
> Stefan
> Am 19.05.2005 um 08:58 schrieb [hidden email]:
>
>> Dear Stephan,
>>
>> Thanks for you fast answer.
>> I think there are some general 'security' hole in nutch. E.g.: if I  
>> make queries with hitsPerPage=10000, or if the user press F5 key in  
>> IE for long time.
>>
>> In my situation the problem is 'paginating' like google (pages:  
>> 1-10). If the isTotalIsExact() results false -> research with  
>> hitsPerPage * 10.
>> I think I will set maxHitsPerSite value to 0 for a week, and I will  
>> try to reanalize how to reprograming the 'paginating'.
>>
>> Thanks, Ferenc
>>
>> Stefan Groschupf wrotte:
>>
>>
>>> I notice similar behaviors.
>>> I guess the backend servers does not answering fast enough.
>>> I was thinking about to have multiple search server groups that  
>>> have  identical content and then query groups in a round robbing  
>>> style.
>>> What people  think about this idea?
>>>
>>> It is already easy to setup multiple tomcat that use different  
>>> search  servers and simply split traffic by adding 2 or n ip to  
>>> your dns for  the same domain.
>>>
>>>
>>> Stefan
>>>
>>> Am 18.05.2005 um 16:59 schrieb [hidden email]:
>>>
>>>
>>>> Dear Users!
>>>>
>>>> Firstly sorry my bad English.
>>>> I  read Stephans great documentation at http://wiki.media- 
>>>> style.com/ display/nutchDocu/.
>>>> I maked a frontend (P4 3 GByte RAM, Tomcat 5.5.7 java 1.4.08)  
>>>> with  3 backend with 12 million pages ( 4million / backend AMD64  4
>>>> GByte  RAM 64 bit linux with jdk 1.5_03).
>>>>
>>>> When I start using it with 3-5 queries / sec, after 1-2 minute  
>>>> the  frontend does'nt answer to the requests.
>>>> In the Tomcat manager / status I see there is many thread busy  
>>>> (150  and it increasses, now 241), and these are with Stage  'S'
>>>> (Service).
>>>>
>>>> The backend with usage: top 40-60 % CPU.
>>>> The frontend with usage: 5% CPU.
>>>>
>>>> Have you any idea what is the problem?
>>>>
>>>> Best Regards,
>>>>    Ferenc
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------
>>> company:        http://www.media-style.com
>>> forum:        http://www.text-mining.org
>>> blog:            http://www.find23.net
>>>
>>>
>>>
>>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by Oracle Space Sweepstakes
>> Want to be the first software developer in space?
>> Enter now for the Oracle Space Sweepstakes!
>> http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
>> _______________________________________________
>> Nutch-developers mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>>
>>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Please help: Tomcat problem, Paginating with optimization (Like google)

luti
In reply to this post by luti
Dear Users,

I have the following problem:

I use Tomcat 5.5.7 on port 80.
There is no problem with 1-2 queries / 10 secs.

When I use it with more users (5-6 queries / sec), the frontend  working
for e.g. 30 minutes on CPU load with 1-2 %, and after it it go up to 60-90%.
The queries is not to increasses in this time (e.g. in the last test it
was 3 queries / 5 sec in the critical point). In this critical time the
backends CPU usages are max. 10-20%.
On the frontend in the Tomcat  manager  there are  many threads with  
'service status' with long time  (e.g.: 586 sec ).
After 10 minutes in the catalina.out  there are the bean.search time i
sincreasses over 10-40 sec, and after some minutes there are many
messages with 'NullPointerException'.

I rewrite the jsp pages to the servlets.
There is my source of Search.java, there is a 'google' paginating. The
source code is optimized (minimal object creating, sb.append, etc.):
package org.nutch;
Beginning of java code
------------------------------------------------------------------------------
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.ServletException;
import org.apache.velocity.exception.*;
import org.apache.velocity.Template;
import org.apache.velocity.app.Velocity;
import org.apache.velocity.context.Context;
import org.apache.velocity.servlet.VelocityServlet;
import net.nutch.html.Entities;
import java.util.*;
import net.nutch.searcher.*;
import net.nutch.plugin.*;
import java.io.*;
import java.net.*;

public class Search extends VelocityServlet {

    private NutchBean bean;

    public void init() throws ServletException {
        try {
            bean = NutchBean.get(this.getServletContext());
        } catch (IOException e) {
            throw new ServletException(e);
        }
    }

    private static final String getStringParameter(String name, String
charset, HttpServletRequest req) {
        String value = req.getParameter(name);
        if (value == null) {
            value = "";
        }
        try {
            value = new String( value.getBytes("8859_1"), charset );
        } catch (Exception e) {;}
        return value;
    }

    public Template handleRequest( HttpServletRequest request,
    HttpServletResponse response,
    Context par_tartalom ) throws java.io.IOException, ServletException {
        Template loc_template = null;

        if (bean == null) { return loc_template; }

        int start = 0;          // first hit to display
        String startString = request.getParameter("start");
        if (startString != null) {
            start = Integer.parseInt(startString);
            if (start < 0) {
                start = 0;
            }
        }
        int hitsPerPage = 10;          // number of hits to display
        String hitsString = request.getParameter("hitsPerPage");
        if (hitsString != null) {
            hitsPerPage = Integer.parseInt(hitsString);
        }

        // get the character encoding to use when interpreting request
values
        String charset = request.getParameter("charset");
        if (charset == null) {
            charset = "UTF8";
        }

        // get query from request
        String queryString = getStringParameter("query", charset, request);

        try {
            StringBuffer sb = new StringBuffer();
            StringBuffer sb2 = new StringBuffer();
            StringBuffer sb3 = new StringBuffer();

            // Query string for html
            String htmlQueryString = Entities.encode(queryString);
            String htmlQueryStringISO = URLEncoder.encode(queryString,
"ISO-8859-2");
            String htmlQueryStringUTF = URLEncoder.encode(queryString,
"UTF8");

            // Get more parameters
            if (hitsPerPage >100) { // No more hitsPerPage than 100
                hitsPerPage = 10;
            }
            int hitsPerSite = 2;          // max hits per site
            String hitsPerSiteString = request.getParameter("hitsPerSite");
            if (hitsPerSiteString != null) {
                hitsPerSite = Integer.parseInt(hitsPerSiteString);
            }

            Query query = Query.parse(queryString);

            int hitsLength = 0;
            int end = 0;
            int length = 0;
            log(sb.append("Query:
").append(request.getRemoteAddr()).append(" -
").append(queryString).toString());
            long startTime = System.currentTimeMillis();
            Hits hits = bean.search(query, start + hitsPerPage,
hitsPerSite);
            hitsLength = (int)hits.getTotal();
            int length2 = hits.getLength();
            par_tartalom.put("TIME",
String.valueOf((System.currentTimeMillis()-startTime) / 1000.0) );
            if (length2  <= start) { // If after 'start' there are not hits
                start = length2 - (length2 % hitsPerPage);
                if (length2 == start) {
                    start = start - hitsPerPage;
                    if (start < 0) start = 0;
                }
                end = length2;
            } else { // If after 'start' there are some hits
                end = length2 < start+hitsPerPage ? length2 :
start+hitsPerPage;
            }
            length = end-start;
            Hit[] show = hits.getHits(start, end-start);
            HitDetails[] details = bean.getDetails(show);
            String[] summaries = bean.getSummary(details, query);
            sb.setLength(0);
            log(sb.append("Query:
").append(request.getRemoteAddr()).append(" -
").append(queryString).append(" - Total hits:
").append(hits.getTotal()).append(" - Time:
").append(System.currentTimeMillis()-startTime).toString());

            par_tartalom.put("START", new Long((end==0)?0:(start+1)) );
// Start of hits
            par_tartalom.put("END", new Long(end)); // End of hits
            par_tartalom.put("CNT", new Long(hits.getTotal())); // Count
of hits
            par_tartalom.put("QRY", htmlQueryString); // UTF8
            par_tartalom.put("QRY2", htmlQueryStringISO); // ISO charset

            // ******************************************************
            // List Hits
            // ******************************************************
            sb.setLength(0);
            sb2.setLength(0);
            sb3.setLength(0);
            sb3.append("?idx=");
            Hit hit = null;
            HitDetails detail = null;
            String title = null;
            String url = null;
            String summary = null;
            for (int i = 0, j; i < length; i++) { // display the hits
                hit = show[i];
                detail = details[i];
                title = detail.getValue("title");
                url = detail.getValue("url");
                summary = summaries[i];
                sb3.setLength(5);
                sb3.append( hit.getIndexNo() ).append("&id=").append(
hit.getIndexDocNo() );

                if (title == null || title.equals("")) { // use url for
docs w/o title
                    title = url;
                }
                ... Same with search.jsp ...
            }
            if (length2 <= start + hitsPerPage && hitsLength != 0) { //
If lower tahn hitsPerPage
                sb.append("<span style=\"FONT-WEIGHT: bold; color:
black;\">This is the last page.</span><br><br>");
                hitsLength = length2; // paginating length
            }
            par_tartalom.put("LIST", sb.toString());

            // ******************************************************
            // Paginating
            // ******************************************************
            int pageNumber = 0;
            sb.setLength(0);
            sb2.setLength(0);
            sb2.append("&hitsPerPage=");
            sb2.append(hitsPerPage);
            sb3.setLength(0);
            sb3.append("<a
href=\"Keres?query=").append(htmlQueryStringUTF).append("&start=");

            // Prev (<<)
            sb.append("<td width=60 class=pages align=right>");
            if (start>0) {
                long prevStart = start-hitsPerPage;
                prevStart = prevStart > 0 ? prevStart : 0;
                sb.append(sb3); // query
                sb.append(prevStart); // start
                sb.append(sb2); // others
                sb.append("\" class=pages><font class=arrow><<</font> <<
</a>");
            }
            sb.append("</td><td class=pages width=250 align=center>");

            if (hitsLength > hitsPerPage ) { // If there are more pages
                if (start >= 9 * hitsPerPage) { // from page 10 (from 90)
                    int startPageNumber = start-(4 * hitsPerPage);
                    if (startPageNumber < 0) startPageNumber = 0;
                    pageNumber = (startPageNumber-1) / hitsPerPage+1;
                    for (int i = startPageNumber; i < hitsLength; i = i
+ hitsPerPage) {
                        pageNumber++;
                        if (start == i) {
                            sb.append("<font class=active_nr><b>");
                            sb.append(pageNumber);
                            sb.append("</b>&nbsp;</font>");
                        } else {
                            sb.append(sb3);// query
                            sb.append(i); // start
                            sb.append(sb2); // others
                            sb.append("\" class=inactive_nr>");
                            sb.append(pageNumber);
                            sb.append("</a>&nbsp;");
                        }
                        if (i >= startPageNumber+8*hitsPerPage) {
                            break;
                        }
                    }
                } else { // more than 9 pages
                    for (int i = 0; i < hitsLength; i = i + hitsPerPage) {
                        pageNumber++;
                        if (start == i) {
                            sb.append("<font class=active_nr><b>");
                            sb.append(pageNumber);
                            sb.append("</b>&nbsp;</font>");
                        } else {
                            sb.append(sb3);// query
                            sb.append(i);  // start
                            sb.append(sb2); // others
                            sb.append("\" class=inactive_nr>");
                            sb.append(pageNumber);
                            sb.append("</a>&nbsp;");
                        }
                        if (pageNumber >= 10) {
                            break;
                        }
                    }
                }
            }
            sb.append("</td><td width=100 class=pages>");

            // next (>>)
            if (hitsLength > start+hitsPerPage) {
                if (end <hitsLength) {
                    sb.append(sb3); // query
                    sb.append(end); // start
                    sb.append(sb2); // others
                    sb.append("\" class=pages> >> <font
class=arrow>>></font></a>");
                }
            }
            sb.append("</td>");

            par_tartalom.put("PAGES", sb.toString());

            loc_template = Velocity.getTemplate("search.vm");
//        } catch (ArrayIndexOutOfBoundsException ae){
//            log(request.getRemoteAddr()+" - forwarding - "+queryString);
//        }
        } catch (Exception e) {
            log(e.toString()+" - " + request.getRemoteAddr()+" -
"+queryString);
            //            throw new ServletException(e);
        }
        return loc_template;
    }
}
------------------------------------------------------------------------------
End of java code.

Have you any idea what is the possible problem source?

Best Regards,
    Ferenc
Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Piotr Kosiorowski
In reply to this post by Andrzej Białecki-2
Hello,
So it looks like my load-balancing solution is interesting for someone
except me so I will try to port it in next one, two weeks.

There are two more things left:
1) Currently when one of the servers in the group fails to respond it is
ignored and search results are returned without data present in segments
from this server (see org.apache.nutch.ipc.Client.call(Writable[]
params, InetSocketAddress[] addresses)). This behavior is a problem for
load-balancing solution as second request by the same user can be
handled by different set of servers we can have duplicated search
results (when user wants to see next page of results) or missing
information when user wants to see explanation or cached data.
In my opinion in load -balancing solution such situation should be
considered an error and other set of the servers should be used for
handling user query.
To support new and old behavior some changes to RPC and Client classes
are required - I used to overwrite inherited method in
DistributedClient.Client but because of recent changes it is not
possible to do it without affecting RPC and Client in ipc package.

2)As behavior of the system will change probably old version of search
code should also be supported - I have to investigate how to handle both
versions of the code at the same time and not to create a mess.

I will post my first results as soon as I make it work for comments.

Regards,
Piotr

Andrzej Bialecki wrote:

> Piotr Kosiorowski wrote:
>
>> Hello Stefan,
>>
>> I have already written a component that implements this round robin
>> searching functionality some time ago - but right now it is not working
>
> [..]
>
>>
>> So if there is enough interest for it I can downgrade it to JDK 1.4 (I
>> am using java.util.concurrent) and send it as a patch.
>
>
> That would be very nice! concurrent-1.3.4.jar is already a part of the
> Nutch distribution - I don't remember how different it is from JDK5, but
> this should ease the porting effort...
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Distributed installation

Doug Cutting-2
Piotr Kosiorowski wrote:
> 2)As behavior of the system will change probably old version of search
> code should also be supported - I have to investigate how to handle both
> versions of the code at the same time and not to create a mess.

I think this would be a great addition.  Do not worry too much about
working in old versions of Nutch.  We're still pre-1.0.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Please help: Tomcat problem, Paginating with optimization (Like google)

Olaf Thiele
In reply to this post by luti
Hi Ferenc,
does the error happen only with your own servlet or
did you experience it with the standardized interface?

Furthermore, you report on NullPointers. Where exactly
do they happen. The printstacktrace should tell you that.
This would be a good starting point.

Hope this helps
Olaf


On 5/23/05, [hidden email] <[hidden email]> wrote:

> Dear Users,
>
> I have the following problem:
>
> I use Tomcat 5.5.7 on port 80.
> There is no problem with 1-2 queries / 10 secs.
>
> When I use it with more users (5-6 queries / sec), the frontend  working
> for e.g. 30 minutes on CPU load with 1-2 %, and after it it go up to 60-90%.
> The queries is not to increasses in this time (e.g. in the last test it
> was 3 queries / 5 sec in the critical point). In this critical time the
> backends CPU usages are max. 10-20%.
> On the frontend in the Tomcat  manager  there are  many threads with
> 'service status' with long time  (e.g.: 586 sec ).
> After 10 minutes in the catalina.out  there are the bean.search time i
> sincreasses over 10-40 sec, and after some minutes there are many
> messages with 'NullPointerException'.
>
> I rewrite the jsp pages to the servlets.
> There is my source of Search.java, there is a 'google' paginating. The
> source code is optimized (minimal object creating, sb.append, etc.):
> package org.nutch;
> Beginning of java code
> ------------------------------------------------------------------------------
> import javax.servlet.http.HttpServletRequest;
> import javax.servlet.http.HttpServletResponse;
> import javax.servlet.ServletException;
> import org.apache.velocity.exception.*;
> import org.apache.velocity.Template;
> import org.apache.velocity.app.Velocity;
> import org.apache.velocity.context.Context;
> import org.apache.velocity.servlet.VelocityServlet;
> import net.nutch.html.Entities;
> import java.util.*;
> import net.nutch.searcher.*;
> import net.nutch.plugin.*;
> import java.io.*;
> import java.net.*;
>
> public class Search extends VelocityServlet {
>
>     private NutchBean bean;
>
>     public void init() throws ServletException {
>         try {
>             bean = NutchBean.get(this.getServletContext());
>         } catch (IOException e) {
>             throw new ServletException(e);
>         }
>     }
>
>     private static final String getStringParameter(String name, String
> charset, HttpServletRequest req) {
>         String value = req.getParameter(name);
>         if (value == null) {
>             value = "";
>         }
>         try {
>             value = new String( value.getBytes("8859_1"), charset );
>         } catch (Exception e) {;}
>         return value;
>     }
>
>     public Template handleRequest( HttpServletRequest request,
>     HttpServletResponse response,
>     Context par_tartalom ) throws java.io.IOException, ServletException {
>         Template loc_template = null;
>
>         if (bean == null) { return loc_template; }
>
>         int start = 0;          // first hit to display
>         String startString = request.getParameter("start");
>         if (startString != null) {
>             start = Integer.parseInt(startString);
>             if (start < 0) {
>                 start = 0;
>             }
>         }
>         int hitsPerPage = 10;          // number of hits to display
>         String hitsString = request.getParameter("hitsPerPage");
>         if (hitsString != null) {
>             hitsPerPage = Integer.parseInt(hitsString);
>         }
>
>         // get the character encoding to use when interpreting request
> values
>         String charset = request.getParameter("charset");
>         if (charset == null) {
>             charset = "UTF8";
>         }
>
>         // get query from request
>         String queryString = getStringParameter("query", charset, request);
>
>         try {
>             StringBuffer sb = new StringBuffer();
>             StringBuffer sb2 = new StringBuffer();
>             StringBuffer sb3 = new StringBuffer();
>
>             // Query string for html
>             String htmlQueryString = Entities.encode(queryString);
>             String htmlQueryStringISO = URLEncoder.encode(queryString,
> "ISO-8859-2");
>             String htmlQueryStringUTF = URLEncoder.encode(queryString,
> "UTF8");
>
>             // Get more parameters
>             if (hitsPerPage >100) { // No more hitsPerPage than 100
>                 hitsPerPage = 10;
>             }
>             int hitsPerSite = 2;          // max hits per site
>             String hitsPerSiteString = request.getParameter("hitsPerSite");
>             if (hitsPerSiteString != null) {
>                 hitsPerSite = Integer.parseInt(hitsPerSiteString);
>             }
>
>             Query query = Query.parse(queryString);
>
>             int hitsLength = 0;
>             int end = 0;
>             int length = 0;
>             log(sb.append("Query:
> ").append(request.getRemoteAddr()).append(" -
> ").append(queryString).toString());
>             long startTime = System.currentTimeMillis();
>             Hits hits = bean.search(query, start + hitsPerPage,
> hitsPerSite);
>             hitsLength = (int)hits.getTotal();
>             int length2 = hits.getLength();
>             par_tartalom.put("TIME",
> String.valueOf((System.currentTimeMillis()-startTime) / 1000.0) );
>             if (length2  <= start) { // If after 'start' there are not hits
>                 start = length2 - (length2 % hitsPerPage);
>                 if (length2 == start) {
>                     start = start - hitsPerPage;
>                     if (start < 0) start = 0;
>                 }
>                 end = length2;
>             } else { // If after 'start' there are some hits
>                 end = length2 < start+hitsPerPage ? length2 :
> start+hitsPerPage;
>             }
>             length = end-start;
>             Hit[] show = hits.getHits(start, end-start);
>             HitDetails[] details = bean.getDetails(show);
>             String[] summaries = bean.getSummary(details, query);
>             sb.setLength(0);
>             log(sb.append("Query:
> ").append(request.getRemoteAddr()).append(" -
> ").append(queryString).append(" - Total hits:
> ").append(hits.getTotal()).append(" - Time:
> ").append(System.currentTimeMillis()-startTime).toString());
>
>             par_tartalom.put("START", new Long((end==0)?0:(start+1)) );
> // Start of hits
>             par_tartalom.put("END", new Long(end)); // End of hits
>             par_tartalom.put("CNT", new Long(hits.getTotal())); // Count
> of hits
>             par_tartalom.put("QRY", htmlQueryString); // UTF8
>             par_tartalom.put("QRY2", htmlQueryStringISO); // ISO charset
>
>             // ******************************************************
>             // List Hits
>             // ******************************************************
>             sb.setLength(0);
>             sb2.setLength(0);
>             sb3.setLength(0);
>             sb3.append("?idx=");
>             Hit hit = null;
>             HitDetails detail = null;
>             String title = null;
>             String url = null;
>             String summary = null;
>             for (int i = 0, j; i < length; i++) { // display the hits
>                 hit = show[i];
>                 detail = details[i];
>                 title = detail.getValue("title");
>                 url = detail.getValue("url");
>                 summary = summaries[i];
>                 sb3.setLength(5);
>                 sb3.append( hit.getIndexNo() ).append("&id=").append(
> hit.getIndexDocNo() );
>
>                 if (title == null || title.equals("")) { // use url for
> docs w/o title
>                     title = url;
>                 }
>                 ... Same with search.jsp ...
>             }
>             if (length2 <= start + hitsPerPage && hitsLength != 0) { //
> If lower tahn hitsPerPage
>                 sb.append("<span style=\"FONT-WEIGHT: bold; color:
> black;\">This is the last page.</span><br><br>");
>                 hitsLength = length2; // paginating length
>             }
>             par_tartalom.put("LIST", sb.toString());
>
>             // ******************************************************
>             // Paginating
>             // ******************************************************
>             int pageNumber = 0;
>             sb.setLength(0);
>             sb2.setLength(0);
>             sb2.append("&hitsPerPage=");
>             sb2.append(hitsPerPage);
>             sb3.setLength(0);
>             sb3.append("<a
> href=\"Keres?query=").append(htmlQueryStringUTF).append("&start=");
>
>             // Prev (<<)
>             sb.append("<td width=60 class=pages align=right>");
>             if (start>0) {
>                 long prevStart = start-hitsPerPage;
>                 prevStart = prevStart > 0 ? prevStart : 0;
>                 sb.append(sb3); // query
>                 sb.append(prevStart); // start
>                 sb.append(sb2); // others
>                 sb.append("\" class=pages><font class=arrow><<</font> <<
> </a>");
>             }
>             sb.append("</td><td class=pages width=250 align=center>");
>
>             if (hitsLength > hitsPerPage ) { // If there are more pages
>                 if (start >= 9 * hitsPerPage) { // from page 10 (from 90)
>                     int startPageNumber = start-(4 * hitsPerPage);
>                     if (startPageNumber < 0) startPageNumber = 0;
>                     pageNumber = (startPageNumber-1) / hitsPerPage+1;
>                     for (int i = startPageNumber; i < hitsLength; i = i
> + hitsPerPage) {
>                         pageNumber++;
>                         if (start == i) {
>                             sb.append("<font class=active_nr><b>");
>                             sb.append(pageNumber);
>                             sb.append("</b>&nbsp;</font>");
>                         } else {
>                             sb.append(sb3);// query
>                             sb.append(i); // start
>                             sb.append(sb2); // others
>                             sb.append("\" class=inactive_nr>");
>                             sb.append(pageNumber);
>                             sb.append("</a>&nbsp;");
>                         }
>                         if (i >= startPageNumber+8*hitsPerPage) {
>                             break;
>                         }
>                     }
>                 } else { // more than 9 pages
>                     for (int i = 0; i < hitsLength; i = i + hitsPerPage) {
>                         pageNumber++;
>                         if (start == i) {
>                             sb.append("<font class=active_nr><b>");
>                             sb.append(pageNumber);
>                             sb.append("</b>&nbsp;</font>");
>                         } else {
>                             sb.append(sb3);// query
>                             sb.append(i);  // start
>                             sb.append(sb2); // others
>                             sb.append("\" class=inactive_nr>");
>                             sb.append(pageNumber);
>                             sb.append("</a>&nbsp;");
>                         }
>                         if (pageNumber >= 10) {
>                             break;
>                         }
>                     }
>                 }
>             }
>             sb.append("</td><td width=100 class=pages>");
>
>             // next (>>)
>             if (hitsLength > start+hitsPerPage) {
>                 if (end <hitsLength) {
>                     sb.append(sb3); // query
>                     sb.append(end); // start
>                     sb.append(sb2); // others
>                     sb.append("\" class=pages> >> <font
> class=arrow>>></font></a>");
>                 }
>             }
>             sb.append("</td>");
>
>             par_tartalom.put("PAGES", sb.toString());
>
>             loc_template = Velocity.getTemplate("search.vm");
> //        } catch (ArrayIndexOutOfBoundsException ae){
> //            log(request.getRemoteAddr()+" - forwarding - "+queryString);
> //        }
>         } catch (Exception e) {
>             log(e.toString()+" - " + request.getRemoteAddr()+" -
> "+queryString);
>             //            throw new ServletException(e);
>         }
>         return loc_template;
>     }
> }
> ------------------------------------------------------------------------------
> End of java code.
>
> Have you any idea what is the possible problem source?
>
> Best Regards,
>     Ferenc
>


--

<SimpleHuman gender="male">
   <Physical name="Olaf Thiele" />
   <Virtual adress="http://www.olafthiele.de" />
</SimpleHuman>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Please help: Tomcat problem, Paginating with optimization (Like google)

luti
Dear Olaf,

Thanks for answer. I found the following:
If I use Tomcat or Resin only, the server always broken with large
queries. In the Tomcat manager I found that there are many thread with
'S' status with long time.
I analize these threads with 'netstat -anp' command from linux prompt. I
found these connections in 'CLOSE_WAIT' status. This is present that,
the client don't answer the CLOSE status, when the server send the
answer out (for e.g. close the browser before full answer arrive).

My solution:
I put front of the Tomcat an Apache2 with jk2 and these problem is
solved. With Apache2 the answer times is better too than Tomcat alone.
Before it the CPU usage was 50-90%, now only 1-10%. From the Apache I'm  
cache static contents (gif, html, css etc. - I have many items from it),
and I think the Apache serve the static contents too, better than Resin
or Tomcat.

Regards,
    Ferenc

Olaf Thiele wrotte:

>Hi Ferenc,
>does the error happen only with your own servlet or
>did you experience it with the standardized interface?
>
>Furthermore, you report on NullPointers. Where exactly
>do they happen. The printstacktrace should tell you that.
>This would be a good starting point.
>
>Hope this helps
>Olaf
>
>
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Please help: Tomcat problem, Paginating with optimization (Like google)

Byron Miller-2
I use Resin with Apache 1.3.x as well. I've never had
any luck running resin/tomcat by themselves.

I have also had great luck using squid as a
proxy/caching server in front of both. That helped
boost queries per second nicely keeping much of the
tcp over head off the jvm/apache.

--- "[hidden email]" <[hidden email]>
wrote:

> Dear Olaf,
>
> Thanks for answer. I found the following:
> If I use Tomcat or Resin only, the server always
> broken with large
> queries. In the Tomcat manager I found that there
> are many thread with
> 'S' status with long time.
> I analize these threads with 'netstat -anp' command
> from linux prompt. I
> found these connections in 'CLOSE_WAIT' status. This
> is present that,
> the client don't answer the CLOSE status, when the
> server send the
> answer out (for e.g. close the browser before full
> answer arrive).
>
>

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Re: Please help: Tomcat problem, Paginating with optimization (Like google)

luti
Thanks again for squid idea, this is solve my problem. I will use squid
on other web-servers, this is dramaticaly decrease cpu overload.

Byron Miller wrotte:

>I use Resin with Apache 1.3.x as well. I've never had
>any luck running resin/tomcat by themselves.
>
>I have also had great luck using squid as a
>proxy/caching server in front of both. That helped
>boost queries per second nicely keeping much of the
>tcp over head off the jvm/apache.
>
>--- "[hidden email]" <[hidden email]>
>wrote:
>
>  
>
>>Dear Olaf,
>>
>>Thanks for answer. I found the following:
>>If I use Tomcat or Resin only, the server always
>>broken with large
>>queries. In the Tomcat manager I found that there
>>are many thread with
>>'S' status with long time.
>>I analize these threads with 'netstat -anp' command
>>from linux prompt. I
>>found these connections in 'CLOSE_WAIT' status. This
>>is present that,
>>the client don't answer the CLOSE status, when the
>>server send the
>>answer out (for e.g. close the browser before full
>>answer arrive).
>>
>>
>>    
>>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com 
>
>
>  
>