Threads in Solr

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Threads in Solr

Evgeniy Strokin
Hello,
I'm overwriting getFacetInfo(...) method from standard request handler (BTW: thanks for making a separate method for faceting :-))
What I need is to ran original query several times with filter query which I generate based on result from original query. Bellow is part of my code.
I was thinking may be I could run those queries not one by one but in parallel, in separate threads. But it appears that it takes longer than to run queries one by one.
Do you have any idea why? Do you think the idea to run those queries in separate threads is good in general? Are SolrIndexSearcher and SimpleFacets thread safe?

Thank you
Gene

------------------------------------------------------------------
    protected NamedList getFacetInfo(SolrQueryRequest req,
                                     SolrQueryResponse rsp,
                                     DocSet mainSet) {
        SimpleFacets f = new SimpleFacets(req.getSearcher(),
                mainSet,
                req.getParams());
        NamedList facetInfo = f.getFacetCounts();
////////////////// This is custom code for multi facets
..................
........ Truncated
..................
                    for (int i = 0; i < shortFld.size(); i++) {
                        SolrQueryParser qp = new SolrQueryParser(s.getSchema(), null);
                        Query q = qp.parse(shortFldName + ":" + shortFld.getName(i));
                        filters.add(q);
                        DocListAndSet matrixRes = s.getDocListAndSet(query, filters, null, 0, 0, flags);
                        NamedList matr = new SimpleFacets(req.getSearcher(),
                                matrixRes.docSet,
                                req.getParams()).getFacetCounts();
                        facetFields.add(shortFld.getName(i), matr.get("facet_fields"));
                       
                        filters.remove(q);
                    }
..................
........ Truncated
..................
        return facetInfo;
    }
Reply | Threaded
Open this post in threaded view
|

Re: Threads in Solr

hossman

: I was thinking may be I could run those queries not one by one but in
: parallel, in separate threads. But it appears that it takes longer than
: to run queries one by one.

: Do you have any idea why? Do you think the idea to run those queries in
: separate threads is good in general? Are SolrIndexSearcher and
: SimpleFacets thread safe?

SolrIndexSearcher is threadsafe ... SimpleFacets should be thread safe,
but i won't swear to it off the top of my head.  without seeing exactly
how you setup your threads, it's hard to guess ... in general multiple
threads are only useful if you are io bound, or have hardware that can
take advantage of parallelization (ie: multiple cores).

but it's also possible that things take just as long because all of your
threads wind up computing the same DocSets at the same time -- or block on
generating the same FieldCache arrays at the same time.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Threads in Solr

Evgeniy Strokin
In reply to this post by Evgeniy Strokin
Yes I do computing the same DocSet. Should it be the problem? Is any way to solve it?
In general in each thread I ran the same query and add different Filter Query.



----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: Solr User <[hidden email]>
Sent: Monday, February 25, 2008 2:19:02 AM
Subject: Re: Threads in Solr


: I was thinking may be I could run those queries not one by one but in
: parallel, in separate threads. But it appears that it takes longer than
: to run queries one by one.

: Do you have any idea why? Do you think the idea to run those queries in
: separate threads is good in general? Are SolrIndexSearcher and
: SimpleFacets thread safe?

SolrIndexSearcher is threadsafe ... SimpleFacets should be thread safe,
but i won't swear to it off the top of my head.  without seeing exactly
how you setup your threads, it's hard to guess ... in general multiple
threads are only useful if you are io bound, or have hardware that can
take advantage of parallelization (ie: multiple cores).

but it's also possible that things take just as long because all of your
threads wind up computing the same DocSets at the same time -- or block on
generating the same FieldCache arrays at the same time.



-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Threads in Solr

hossman
: Yes I do computing the same DocSet. Should it be the problem? Is any way to solve it?
: In general in each thread I ran the same query and add different Filter Query.

it's not neccessarily a problem, it's just that you may not get much
benefit from prallelization if all of the worker threads are doing the
same work simulteneously.

but like i said:  without knowing exactly what your threading code looks
like, it's hard to guess what might be wrong (and even if i was looking
right at your multithreaded code, it wouldn't neccessarily be obvious to
me, my multi-threading knowledge is mediocre) and it's still not clear if
you are testing on hardware that can actually take advantage of
parallelization.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Threads in Solr

Evgeniy Strokin
In reply to this post by Evgeniy Strokin
I'm running my tests on server with 4 double-kernel CPU. I was expecting good improvements from multithreaded solution but I have speed 10th times worse. Here is how I run those threads, I think I'm doing something wrong, please advise:
 
------------------------------------------
............. code truncated .............
 
public class MultiFacetRequestHandler extends StandardRequestHandler {

    protected NamedList getFacetInfo(SolrQueryRequest req,
                                     SolrQueryResponse rsp,
                                     DocSet mainSet) {
        SimpleFacets f = new SimpleFacets(req.getSearcher(),
                mainSet,
                req.getParams());
        NamedList facetInfo = f.getFacetCounts();
////////////////// This is custom code for multi facets
        SolrParams p = req.getParams();
        String fl = p.get(SolrParams.FL);
        int flags = 0;
        if (fl != null)
            flags |= SolrPluginUtils.setReturnFields(fl, rsp);
        Query query = QueryParsing.parseQuery(p.required().get(SolrParams.Q),
                p.get(SolrParams.DF), p, req.getSchema());
        try {
                NamedList facetFields = (NamedList) facetInfo.get("facet_fields");
                if (facetFields.size() == 2) {
                    String shortFldName = facetFields.getName(0);
                    NamedList shortFld = (NamedList) facetFields.getVal(0);
                    NamedList longFld = (NamedList) facetFields.getVal(1);
                    if (shortFld.size() > longFld.size()) {
                        shortFld = longFld;
                        shortFldName = facetFields.getName(1);
                    }
                    List<Query> filters = SolrPluginUtils.parseFilterQueries(req);
                    if (filters == null) filters = new LinkedList<Query>();
                    SolrIndexSearcher s = req.getSearcher();
                    Vector<Thread> threads = new Vector<Thread>();
                    Thread thread;
                    for (int i = 0; i < shortFld.size(); i++) {
                        SolrQueryParser qp = new SolrQueryParser(s.getSchema(), null);
                        Query q = qp.parse(shortFldName + ":\"" + shortFld.getName(i)+"\"");
                        List<Query> fltrs=new LinkedList<Query>();
                        fltrs.addAll(filters);
                        fltrs.add(q);
                        thread = new Thread(makeRunnable(s,query,fltrs,flags,p,shortFld.getName(i),facetFields));
                        threads.add(thread);
                        thread.start();
                    }
                    for (Thread thread1 : threads) {
                        thread1.join();
                    }
                }
        } catch (Exception e) {
            SolrException.logOnce(SolrCore.log, "Exception in multi faceting", e);
        }
///////////////////////////////////////////////
        return facetInfo;
    }
 
    public Runnable makeRunnable(final SolrIndexSearcher s, final Query query, final List<Query> filters, final int flags, final SolrParams p, final String shrtName, final NamedList facetFields) {
        return new Runnable() {
            public void run() {
                try{
                    DocListAndSet matrixRes = s.getDocListAndSet(query, filters, null, 0, 0, flags);
                    NamedList matr = new SimpleFacets(s,matrixRes.docSet,p).getFacetCounts();
                    facetFields.add(shrtName, matr.get("facet_fields"));
                }catch (Exception e){
                     SolrException.logOnce(SolrCore.log, "Exception in multi faceting", e);
                }
            }
        };
    }
............. code truncated .............
}
 

 



----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Sent: Tuesday, February 26, 2008 2:55:36 AM
Subject: Re: Threads in Solr

: Yes I do computing the same DocSet. Should it be the problem? Is any way to solve it?
: In general in each thread I ran the same query and add different Filter Query.

it's not neccessarily a problem, it's just that you may not get much
benefit from prallelization if all of the worker threads are doing the
same work simulteneously.

but like i said:  without knowing exactly what your threading code looks
like, it's hard to guess what might be wrong (and even if i was looking
right at your multithreaded code, it wouldn't neccessarily be obvious to
me, my multi-threading knowledge is mediocre) and it's still not clear if
you are testing on hardware that can actually take advantage of
parallelization.


-Hoss
Reply | Threaded
Open this post in threaded view
|

AW: Threads in Solr

Hausherr, Jens-2
It has been some time since I last worked with the Lucene index directly, but AFAIK the lucene index by default is not thread-safe which means it is propably wrapped in som synchronization layer.

Concerning the bad performance I can only guess on some items to examine:

1) Every thread performs a complete query.
2) Assuming that the query takes time "t" to perform concludes that "n" threads will run (max) "n*t"
3) If your threads hit some synchronized method they are likely to queue at the synchronization barrier which might lead to "n*t" execution time.
4) The join statement at the end of your code snippet ensures that your request handler continues iff all threads have completed.
5) Vectors are synchronized - it might not be necessary to use a Vector for storing your threads (as far the code snippet is concerned at least - I see no concurrent access to the threads here)

Personally I think that to profit from parallelization it would be necessary to segment the index to perform disjunct queries - I do not know whether solr odr lucene already support this feature...

/Jens

-----Urspr√ľngliche Nachricht-----
Von: Evgeniy Strokin [mailto:[hidden email]]
Gesendet: Dienstag, 26. Februar 2008 16:57
An: [hidden email]
Betreff: Re: Threads in Solr

I'm running my tests on server with 4 double-kernel CPU. I was expecting good improvements from multithreaded solution but I have speed 10th times worse. Here is how I run those threads, I think I'm doing something wrong, please advise:
 
------------------------------------------
............. code truncated .............
 
public class MultiFacetRequestHandler extends StandardRequestHandler {

    protected NamedList getFacetInfo(SolrQueryRequest req,
                                     SolrQueryResponse rsp,
                                     DocSet mainSet) {
        SimpleFacets f = new SimpleFacets(req.getSearcher(),
                mainSet,
                req.getParams());
        NamedList facetInfo = f.getFacetCounts(); ////////////////// This is custom code for multi facets
        SolrParams p = req.getParams();
        String fl = p.get(SolrParams.FL);
        int flags = 0;
        if (fl != null)
            flags |= SolrPluginUtils.setReturnFields(fl, rsp);
        Query query = QueryParsing.parseQuery(p.required().get(SolrParams.Q),
                p.get(SolrParams.DF), p, req.getSchema());
        try {
                NamedList facetFields = (NamedList) facetInfo.get("facet_fields");
                if (facetFields.size() == 2) {
                    String shortFldName = facetFields.getName(0);
                    NamedList shortFld = (NamedList) facetFields.getVal(0);
                    NamedList longFld = (NamedList) facetFields.getVal(1);
                    if (shortFld.size() > longFld.size()) {
                        shortFld = longFld;
                        shortFldName = facetFields.getName(1);
                    }
                    List<Query> filters = SolrPluginUtils.parseFilterQueries(req);
                    if (filters == null) filters = new LinkedList<Query>();
                    SolrIndexSearcher s = req.getSearcher();
                    Vector<Thread> threads = new Vector<Thread>();
                    Thread thread;
                    for (int i = 0; i < shortFld.size(); i++) {
                        SolrQueryParser qp = new SolrQueryParser(s.getSchema(), null);
                        Query q = qp.parse(shortFldName + ":\"" + shortFld.getName(i)+"\"");
                        List<Query> fltrs=new LinkedList<Query>();
                        fltrs.addAll(filters);
                        fltrs.add(q);
                        thread = new Thread(makeRunnable(s,query,fltrs,flags,p,shortFld.getName(i),facetFields));
                        threads.add(thread);
                        thread.start();
                    }
                    for (Thread thread1 : threads) {
                        thread1.join();
                    }
                }
        } catch (Exception e) {
            SolrException.logOnce(SolrCore.log, "Exception in multi faceting", e);
        }
///////////////////////////////////////////////
        return facetInfo;
    }
 
    public Runnable makeRunnable(final SolrIndexSearcher s, final Query query, final List<Query> filters, final int flags, final SolrParams p, final String shrtName, final NamedList facetFields) {
        return new Runnable() {
            public void run() {
                try{
                    DocListAndSet matrixRes = s.getDocListAndSet(query, filters, null, 0, 0, flags);
                    NamedList matr = new SimpleFacets(s,matrixRes.docSet,p).getFacetCounts();
                    facetFields.add(shrtName, matr.get("facet_fields"));
                }catch (Exception e){
                     SolrException.logOnce(SolrCore.log, "Exception in multi faceting", e);
                }
            }
        };
    }
............. code truncated .............
}
 

 



----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Sent: Tuesday, February 26, 2008 2:55:36 AM
Subject: Re: Threads in Solr

: Yes I do computing the same DocSet. Should it be the problem? Is any way to solve it?
: In general in each thread I ran the same query and add different Filter Query.

it's not neccessarily a problem, it's just that you may not get much benefit from prallelization if all of the worker threads are doing the same work simulteneously.

but like i said:  without knowing exactly what your threading code looks like, it's hard to guess what might be wrong (and even if i was looking right at your multithreaded code, it wouldn't neccessarily be obvious to me, my multi-threading knowledge is mediocre) and it's still not clear if you are testing on hardware that can actually take advantage of parallelization.


-Hoss

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


Reply | Threaded
Open this post in threaded view
|

Re: Threads in Solr

hossman
In reply to this post by Evgeniy Strokin

: I'm running my tests on server with 4 double-kernel CPU. I was expecting
: good improvements from multithreaded solution but I have speed 10th
: times worse. Here is how I run those threads, I think I'm doing
: something wrong, please advise:

As i said, i'm not much of a threads expert, but the one piece of advice i
do remember from someone else is that Thread instantiation is expensive,
and it's better to use Executor pools.

independent of the why the multithreaded version is slower then the single
threaded version however, a couple of things jump out at me...

1) there's no reason i can think of to use SolrQueryParser, the facet
values are already the indexed form so you can just make TermQueries
directly

2) you are calling getDocListAndSet even though you only need the DocSet
part to give to SimpleFacets ... just using getDocSet should be faster.

3) the code as written re-executes the main query using a "List<Query>
filters" that is unique for each thread because it adds one new filter ...
ultimately all you care about is the docset, so instead of reexecuting
that main query over and over, you can just find the DocSet for that
single new filter, and compute the intersection with mainSet (which has
already factoried in the other filters) and give that to SimpleFacets.
something like this (single threaded) should work...

  ... // you already have: mainSet, facetFields, shortFld, shortFldName
  for (int i = 0; i < shortFld.size(); i++) {
    Query q= new TermQuery(shortFldName, shortFld.getName(i));
    DocSet d = s.getDocSet(q, mainSet);
    NamedList tmp = new SimpleFacets(s,d,p).getFacetCounts();
    facetFields.add(shrtName, tmp.get("facet_fields");
  }

...as long as you don't ask Solr to compute the DocList for all of those
permutations (since you don't need them anyway) everything should either
already be in the filterCache, or be a set intersection and should be
crazy freaking fast.

In fact: i'm wondering if the slowdown in performance you were seeing was
because the parallel execution was causing cache evictions ... without
changing anycode, if you startup your solr port, hit your custom request
handler one time, and then look at the stats page, do you see a non-zero
value for "evictions" in any of the caches? ... is the number higher or
lower when you do the same test with your multi-threaded version?  

having caches that are too small might be the full explanation of why the
threaded version is slower, but like i said: you should be able to get a
lot of speed ups just be ditching the DocList method.



-Hoss