Combining search steps without re-searching

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Combining search steps without re-searching

Fernando Mato Mira
Hello,

  We think we would have a problem if we try to use lucene because we
do search combinations which might have hundreds of steps, so creating
a combined query and executing again each time might be a problem.
  What would entail overhauling Lucene to do search combinations by
taking advantage of the results already generated?

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Erik Hatcher
Please elaborate.


On Aug 28, 2006, at 6:21 AM, Fernando Mato Mira wrote:

> Hello,
>
>  We think we would have a problem if we try to use lucene because we
> do search combinations which might have hundreds of steps, so creating
> a combined query and executing again each time might be a problem.
>  What would entail overhauling Lucene to do search combinations by
> taking advantage of the results already generated?
>
> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Chuck Williams-2
In reply to this post by Fernando Mato Mira
I presume your search steps are anded, as in typical drill-downs?

From  a Lucene standpoint, each sequence of steps is a BooleanQuery of
required clauses, one for each step.  To add a step, you extend the
BooleanQuery with a new clause.  To not re-evaluate the full query,
you'd need some query that regenerated the results of the prior step
more efficiently than BooleanQuery.  For example, if you happened to
generate the entire result set for each step, presumably not feasible,
then the results might be cached for regeneration.  Assuming you cannot
generate the entire result set, it's not obvious to me how having
partially generated S1 and ... Sn-1 will help you generate S1 and ... Sn
any faster.

You will already get the the benefit of OS caching with Lucene as it
stands.  You might find further caching extension to the query types you
use to be a performance gain that achieves what you want.  You might
also consider some kind of query optimization by extending the rewrite()
methods.

Chuck


Fernando Mato Mira wrote on 08/28/2006 12:21 AM:

> Hello,
>
>  We think we would have a problem if we try to use lucene because we
> do search combinations which might have hundreds of steps, so creating
> a combined query and executing again each time might be a problem.
>  What would entail overhauling Lucene to do search combinations by
> taking advantage of the results already generated?
>
> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Andrzej Białecki-2
Chuck Williams wrote:
> I presume your search steps are anded, as in typical drill-downs?
>
> >From  a Lucene standpoint, each sequence of steps is a BooleanQuery of
> required clauses, one for each step.  To add a step, you extend the
> BooleanQuery with a new clause.  To not re-evaluate the full query,
>  

... umm, guys, wouldn't a series of QueryFilter's work much better in
this case? If some of the clauses are repeatable, then filtering results
through a cached BitSet in such filtered query would work nicely, right?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Chuck Williams-2


Andrzej Bialecki wrote on 08/28/2006 09:19 AM:

> Chuck Williams wrote:
>> I presume your search steps are anded, as in typical drill-downs?
>>
>> >From  a Lucene standpoint, each sequence of steps is a BooleanQuery of
>> required clauses, one for each step.  To add a step, you extend the
>> BooleanQuery with a new clause.  To not re-evaluate the full query,
>>  
>
> ... umm, guys, wouldn't a series of QueryFilter's work much better in
> this case? If some of the clauses are repeatable, then filtering
> results through a cached BitSet in such filtered query would work
> nicely, right?
>
If the possible initial steps comprise a small finite set, I could see
that as a winner.  In my app for instance, the drill-down selectors are
dynamic and drawn from a large set of possibilities.  It's hard to see
how any small set of filters would be much of a benefit.  A large set of
filters would consume too much space.  For a 10 million document node at
1.25 megabytes per filter even a couple hundred filters adds up to
something significant.

As I understand things, filters take considerably more time to initially
create but then can more than make this up through repetitive use.  So
they are a winner iff there are a small number of specific steps that
are frequently and disproportionately used.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

eks dev
you are right Chuck, it depends... Filters are great for fields with small cardinality (majority of terms in normal collection) or things that are sorted (assuming Paul's patch gets commited so we do not use BitSet and we could use less memory hungry structures like interval lists :) With BitSet, paradoxically it makes sense to use them for high freq. terms to save memory

Hi commiters, any chance of getting rid of BitSet in Filter? Can somebody guide what else needs to be done to have it commited, we have a pair of hands to help...

----- Original Message ----
From: Chuck Williams <[hidden email]>
To: [hidden email]
Sent: Monday, 28 August, 2006 10:51:40 PM
Subject: Re: Combining search steps without re-searching



Andrzej Bialecki wrote on 08/28/2006 09:19 AM:

> Chuck Williams wrote:
>> I presume your search steps are anded, as in typical drill-downs?
>>
>> >From  a Lucene standpoint, each sequence of steps is a BooleanQuery of
>> required clauses, one for each step.  To add a step, you extend the
>> BooleanQuery with a new clause.  To not re-evaluate the full query,
>>  
>
> ... umm, guys, wouldn't a series of QueryFilter's work much better in
> this case? If some of the clauses are repeatable, then filtering
> results through a cached BitSet in such filtered query would work
> nicely, right?
>
If the possible initial steps comprise a small finite set, I could see
that as a winner.  In my app for instance, the drill-down selectors are
dynamic and drawn from a large set of possibilities.  It's hard to see
how any small set of filters would be much of a benefit.  A large set of
filters would consume too much space.  For a 10 million document node at
1.25 megabytes per filter even a couple hundred filters adds up to
something significant.

As I understand things, filters take considerably more time to initially
create but then can more than make this up through repetitive use.  So
they are a winner iff there are a small number of specific steps that
are frequently and disproportionately used.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Paul Elschot
On Monday 28 August 2006 23:17, eks dev wrote:
> you are right Chuck, it depends... Filters are great for fields with small
cardinality (majority of terms in normal collection) or things that are
sorted (assuming Paul's patch gets commited so we do not use BitSet and we
could use less memory hungry structures like interval lists :) With BitSet,
paradoxically it makes sense to use them for high freq. terms to save memory

Also, filters loose scoring values of earlier clauses that are filtered for
drilling down.
Adding required clauses to a boolean query and searching this
query combines the scoring values of all clauses.

>
> Hi commiters, any chance of getting rid of BitSet in Filter? Can somebody
guide what else needs to be done to have it commited, we have a pair of hands
to help...

One more here,

Thanks,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Fernando Mato Mira
In reply to this post by Chuck Williams-2
On 8/28/06, Chuck Williams <[hidden email]> wrote:
> I presume your search steps are anded, as in typical drill-downs?

Not necessarily. The result set can also be enlarged by OR very often.
We would also need to add more span clauses to span queries, besides
combining them
with boolean queries.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Doug Cutting
In reply to this post by eks dev
eks dev wrote:
> Hi commiters, any chance of getting rid of BitSet in Filter? Can somebody guide what else needs to be done to have it commited, we have a pair of hands to help...

I'm looking at:

http://issues.apache.org/jira/browse/LUCENE-584

And it doesn't yet look like a no-brainer to commit.  It would be easier
to evaluate if there was a single patch file that could be applied to
the Lucene root directory.  I also still see some FIXME comments.

Doug



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

Paul Elschot
On Wednesday 30 August 2006 01:33, Doug Cutting wrote:
> eks dev wrote:
> > Hi commiters, any chance of getting rid of BitSet in Filter? Can somebody
guide what else needs to be done to have it commited, we have a pair of hands
to help...
>
> I'm looking at:
>
> http://issues.apache.org/jira/browse/LUCENE-584

Thanks.

>
> And it doesn't yet look like a no-brainer to commit.  It would be easier

Splitting off Matcher from Scorer does need some care...

> to evaluate if there was a single patch file that could be applied to
> the Lucene root directory.  I also still see some FIXME comments.

At the moment I don't remember what the FIXME's are about, so I'll
need a bit of time getting back into it.
Once these are settled, I'll try and make a single patch, unless
someone prefers to have a single patch right away.
So far I've had no conflicts with this in my working copy.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Combining search steps without re-searching

eks dev
Paul,
my offer is valid, please shout if and where you need some help, test cases... not toooo skilled with deep Lucene internals, but could help at least in API view...

......
>At the moment I don't remember what the FIXME's are about, so I'll
>need a bit of time getting back into it.
>Once these are settled, I'll try and make a single patch, unless
>someone prefers to have a single patch right away.
>So far I've had no conflicts with this in my working copy.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

LUCENE-584, was "Combining search steps without re-searching"

Paul Elschot
On Wednesday 30 August 2006 21:08, eks dev wrote:
> Paul,
> my offer is valid, please shout if and where you need some help, test
cases... not toooo skilled with deep Lucene internals, but could help at
least in API view...

Well, I just posted a single patch file, and I'd like to know whether this
patch applies cleanly. The patch itself has 841 lines and affects 11 files,
so be careful, perhaps to the point of starting a new working copy.

I have a few other changes in my working copy,
and I think these do not interfere, but I'm not 100% sure.

There are javadoc errors from other places at the moment,
so I hope I got the javadocs right in the patch.

http://issues.apache.org/jira/browse/LUCENE-584

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-584, was "Combining search steps without re-searching"

eks dev
svn "appy patch" complains "The file search/Searchable.java was found twice!..."  I tried to appy patch from   "/src/java/org/apache/lucene"  on current trunk...

----- Original Message ----
From: Paul Elschot <[hidden email]>
To: [hidden email]
Sent: Wednesday, 30 August, 2006 9:49:41 PM
Subject: LUCENE-584, was "Combining search steps without re-searching"

On Wednesday 30 August 2006 21:08, eks dev wrote:
> Paul,
> my offer is valid, please shout if and where you need some help, test
cases... not toooo skilled with deep Lucene internals, but could help at
least in API view...

Well, I just posted a single patch file, and I'd like to know whether this
patch applies cleanly. The patch itself has 841 lines and affects 11 files,
so be careful, perhaps to the point of starting a new working copy.

I have a few other changes in my working copy,
and I think these do not interfere, but I'm not 100% sure.

There are javadoc errors from other places at the moment,
so I hope I got the javadocs right in the patch.

http://issues.apache.org/jira/browse/LUCENE-584

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-584, was "Combining search steps without re-searching"

Paul Elschot
On Wednesday 30 August 2006 22:40, eks dev wrote:
> svn "appy patch" complains "The file search/Searchable.java was found
twice!..."  I tried to appy patch from   "/src/java/org/apache/lucene"  on
current trunk...

That is because the file list I used contains search/Searchable.java twice.
It should have contained an added TestSortedVIntList in src/test instead of
the extra Searchable.java.
I'll regenerate the patch from the trunk directory and post it once more..
Sorry about the mistake...

Regards,
Paul Elschot


>
> ----- Original Message ----
> From: Paul Elschot <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, 30 August, 2006 9:49:41 PM
> Subject: LUCENE-584, was "Combining search steps without re-searching"
>
> On Wednesday 30 August 2006 21:08, eks dev wrote:
> > Paul,
> > my offer is valid, please shout if and where you need some help, test
> cases... not toooo skilled with deep Lucene internals, but could help at
> least in API view...
>
> Well, I just posted a single patch file, and I'd like to know whether this
> patch applies cleanly. The patch itself has 841 lines and affects 11 files,
> so be careful, perhaps to the point of starting a new working copy.
>
> I have a few other changes in my working copy,
> and I think these do not interfere, but I'm not 100% sure.
>
> There are javadoc errors from other places at the moment,
> so I hope I got the javadocs right in the patch.
>
> http://issues.apache.org/jira/browse/LUCENE-584
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LUCENE-584, was "Combining search steps without re-searching"

Mike Klaas
In reply to this post by Paul Elschot
On 8/30/06, Paul Elschot <[hidden email]> wrote:
>
> Well, I just posted a single patch file, and I'd like to know whether this
> patch applies cleanly. The patch itself has 841 lines and affects 11 files,
> so be careful, perhaps to the point of starting a new working copy.

FWIW, I usually check out a fresh copy of trunk/ and apply my own
patch to it to test this.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]