On Saturday 25 June 2005 04:26, jian chen wrote:
> Hi,
>
> I think Span query in general should do more work than simple Phrase
> query. Phrase query, in its simplest form, should just try to find all
> terms that are adjacent to each other. Meanwhile, Span query does not
> necessary be adjacent to each other, but, with other words in between.
>
> Therefore, I think Span query deserves to be slower than Phrase query.
> This said, Span query is way more powerful than Phrase query.
>
> Jian
>
> On 25 Jun 2005 00:00:18 -0000,
[hidden email]
> <
[hidden email]> wrote:
> > Hi,
> >
> > I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> > an 8x difference on Linux. Is a SpanNearQuery doing 8x as much work?
> >
> >
> > I'm considering diving into the code if the results sounds unusual to
people.
> > But if its really doing that much more work, I won't spend time
optimizing
> > something that can't get much faster.
The main difference is in the extra generality of Spans over positions.
Spans have a begin position and an end position.
Matching two Spans for the terms of a phrase requires testing both
their begin positions and their end positions, even though they differ
only by a constant for the same term.
Spans also carry around their current document number and this may
involve some more redundancies when finding finding the matches
within a single document.
Also, for exact matches (zero slop) PhraseQuery uses a separate scorer
that takes full advantage of the special case.
So, when the generality of the Spans is not needed, one should always
try and use a PhraseQuery.
I'm not surprised that SpanNearQuery is slower than PhraseQuery,
and I'd expect a factor 3-4 between them. The factor 8 might indicate that
there is some room for improvement in the span package.
(I'd expect the CellQueue in NearSpans to be the bottleneck.)
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail:
[hidden email]
For additional commands, e-mail:
[hidden email]