Span query performance issue

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Span query performance issue

yahootintin.11533894
Hi,

I'm comparing SpanNearQuery to PhraseQuery results and noticing about
an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?  


I'm considering diving into the code if the results sounds unusual to people.
 But if its really doing that much more work, I won't spend time optimizing
something that can't get much faster.

Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Span query performance issue

jian chen
Hi,

I think Span query in general should do more work than simple Phrase
query. Phrase query, in its simplest form, should just try to find all
terms that are adjacent to each other. Meanwhile, Span query does not
necessary be adjacent to each other, but, with other words in between.

Therefore, I think Span query deserves to be slower than Phrase query.
This said, Span query is way more powerful than Phrase query.

Jian

On 25 Jun 2005 00:00:18 -0000, [hidden email]
<[hidden email]> wrote:

> Hi,
>
> I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?
>
>
> I'm considering diving into the code if the results sounds unusual to people.
>  But if its really doing that much more work, I won't spend time optimizing
> something that can't get much faster.
>
> Thanks.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Span query performance issue

Paul Elschot
On Saturday 25 June 2005 04:26, jian chen wrote:

> Hi,
>
> I think Span query in general should do more work than simple Phrase
> query. Phrase query, in its simplest form, should just try to find all
> terms that are adjacent to each other. Meanwhile, Span query does not
> necessary be adjacent to each other, but, with other words in between.
>
> Therefore, I think Span query deserves to be slower than Phrase query.
> This said, Span query is way more powerful than Phrase query.
>
> Jian
>
> On 25 Jun 2005 00:00:18 -0000, [hidden email]
> <[hidden email]> wrote:
> > Hi,
> >
> > I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> > an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?
> >
> >
> > I'm considering diving into the code if the results sounds unusual to
people.
> >  But if its really doing that much more work, I won't spend time
optimizing
> > something that can't get much faster.

The main difference is in the extra generality of Spans over positions.
Spans have a begin position and an end position.
Matching two Spans for  the terms of a phrase requires testing both
their begin positions and their end positions, even though they differ
only by a constant for the same term.
Spans also carry around their current document number and this may
involve some more redundancies when finding finding the matches
within a single document.
Also, for exact matches (zero slop) PhraseQuery uses a separate scorer
that takes full advantage of the special case.
So, when the generality of the Spans is not needed, one should always
try and use a PhraseQuery.

I'm not surprised that SpanNearQuery is slower than PhraseQuery,
and I'd expect a factor 3-4 between them. The factor 8 might indicate that
there is some room for improvement in the span package.
(I'd expect the CellQueue in NearSpans to be the bottleneck.)

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]