ParallelMultiSearcher and docFreq

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ParallelMultiSearcher and docFreq

Yura Smolsky-2
Hello.

Here is the situation. I have ParallelMultiSearcher object
initializated with two or more RemoteSearchable's.

I run PrefixQuery search on some keyword field, say "link". When I run
search starting just with letter "w" (link:w*) then I should have like 5k
results.

As I know when I perform search on ParallelMultiSearcher query is
being rewritten at first. So my prefix search is being rewritten with
"link:wordlist.com link:web.com and so on about 2-3k of terms". Then as I
understand from debugging for each such term ParallelMultiSearcher performs docFreq
requests to RemoteSearchables (2-3k calls). So we have many requests
to docFreq method and these operations take like 95% of all search time.

I see that we have docFreqs method for RemoteSearchable, but it has
not being used.

Is there any way to get rid of those multiple calls of docFreq?

Maybe I am not correct, then please tell me whats wrong.

Thanks.

--
Yura Smolsky,
http://altervisionmedia.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ParallelMultiSearcher and docFreq

Yura Smolsky-2
Hello, Yura.

Does anyone understand my email? Maybe my English is too bad...

Thanks.

YS> Here is the situation. I have ParallelMultiSearcher object
YS> initializated with two or more RemoteSearchable's.

YS> I run PrefixQuery search on some keyword field, say "link". When I run
YS> search starting just with letter "w" (link:w*) then I should have like 5k
YS> results.

YS> As I know when I perform search on ParallelMultiSearcher query is
YS> being rewritten at first. So my prefix search is being rewritten with
YS> "link:wordlist.com link:web.com and so on about 2-3k of terms". Then as I
YS> understand from debugging for each such term
YS> ParallelMultiSearcher performs docFreq
YS> requests to RemoteSearchables (2-3k calls). So we have many requests
YS> to docFreq method and these operations take like 95% of all search time.

YS> I see that we have docFreqs method for RemoteSearchable, but it has
YS> not being used.

YS> Is there any way to get rid of those multiple calls of docFreq?



--
Yura Smolsky,
http://altervisionmedia.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ParallelMultiSearcher and docFreq

Haines, Ronald C. (LNG-DAY)
In reply to this post by Yura Smolsky-2
I understand...because I've experienced it.  I think the answer is to
'parallelize' the docFreq process...and or try to make use of the
docFreq(Terms[]).  By passing an Array of Terms, you can avoid the 'call
per Term' per remote and just make a single docFreq call per remote.

You might have to extend the ParallelMultiSearcher and create a threaded
docFreq method.

-----Original Message-----
From: Yura Smolsky [mailto:[hidden email]]
Sent: Friday, September 15, 2006 8:54 AM
To: [hidden email]
Subject: Re: ParallelMultiSearcher and docFreq

Hello, Yura.

Does anyone understand my email? Maybe my English is too bad...

Thanks.

YS> Here is the situation. I have ParallelMultiSearcher object
YS> initializated with two or more RemoteSearchable's.

YS> I run PrefixQuery search on some keyword field, say "link". When I
run
YS> search starting just with letter "w" (link:w*) then I should have
like 5k
YS> results.

YS> As I know when I perform search on ParallelMultiSearcher query is
YS> being rewritten at first. So my prefix search is being rewritten
with
YS> "link:wordlist.com link:web.com and so on about 2-3k of terms". Then
as I
YS> understand from debugging for each such term
YS> ParallelMultiSearcher performs docFreq
YS> requests to RemoteSearchables (2-3k calls). So we have many requests
YS> to docFreq method and these operations take like 95% of all search
time.

YS> I see that we have docFreqs method for RemoteSearchable, but it has
YS> not being used.

YS> Is there any way to get rid of those multiple calls of docFreq?



--
Yura Smolsky,
http://altervisionmedia.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re[2]: ParallelMultiSearcher and docFreq

Yura Smolsky-2
Hello, Ronald.

What I have found that nothing except createWeight uses that
docFreqs(Term[]) method...
Maybe I need to parallelize it... But I dont understand something.

When does Multisearcher.createWeight() is being called, b/c only this method
used docFreqs and this method creates HashMap of docFreqs of terms? Is
this method is being user for rewrite of query inside of
ParallelMultiSearher?

Also this method calls docFreqs of RemoteSearchables, I should be
receiving calls of docFreqs(Term[]) to the RemoteSearchable objects,
but I do not. Can somebody explain this?

And from which place am I receive those multiple calls of docFreq
method?

Thanks.

HRCLD> I understand...because I've experienced it.  I think the answer is to
HRCLD> 'parallelize' the docFreq process...and or try to make use of the
HRCLD> docFreq(Terms[]).  By passing an Array of Terms, you can avoid the 'call
HRCLD> per Term' per remote and just make a single docFreq call per remote.

HRCLD> You might have to extend the ParallelMultiSearcher and create a threaded
HRCLD> docFreq method.


--
Yura Smolsky,
http://altervisionmedia.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...