SpanFirstQuery and SpanNotQuery

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SpanFirstQuery and SpanNotQuery

Chris Hostetter-3

I'm looking at SpanQueries as I work on new test cases for LUCENE-557, and
I'm confused by the implimentation of SpanFirstQuery.getSpans().

In the Anonymous Spans instance returned, start() and end() are allways
the start() and end() of the inner SpanQuery for the current doc --
shouldn't the start() be "0" indicating that the span starts at the
begining of the document?

Because of the current implimentation, there doesn't seem to be any way to
use SpanNot with SpanFirst to ensure that a given SpanQuery X allways
occurs in the first N positions, and that SpanQuery Z does not appear
before it.

In other words, this...
   int N = ...
   SpanQuery X =...;
   SpanQuery Z =...;
   SpanQuery first = new SpanFirstQuery(X,N);
   SpanQuery q = new SpanNotQuery(first, Z);

...seems to be functionally equivilent to...
   int N = ...
   SpanQuery X =...;
   //SpanQuery Z =...;
   SpanQuery first = new SpanFirstQuery(X,N);
   SpanQuery q = first;

...is that how it's suppose to work, or am i missunderstanding something?

Here's a small addition to "TestBasics" which shows how i was
expecting things to work...

  public void testSpanNotFirst() throws Exception {
    SpanTermQuery term40 = new SpanTermQuery(new Term("field", "forty"));
    SpanFirstQuery first = new SpanFirstQuery(term40, 5);
    SpanTermQuery hun = new SpanTermQuery(new Term("field", "hundred"));
    SpanNotQuery not = new SpanNotQuery(first, hun);

    checkHits(not, new int[]{40,41,42,43,44,45,46,47,48,49});

  }


...but the results contain every number with a 4 in the tens digit



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SpanFirstQuery and SpanNotQuery

Chris Hostetter-3

After consulting LIA, I find the following sentence at the end of 5.4.2
"Finding spans at the beinning of a field" ...

        The resulting span matches are the same as the orriginal SpanQuery
        spans, ...

...which indicates that it is working as documented -- but it remains to
be seen if that's what was intended or not.

In any case, it begs the question, how would someone go about building a
SpanQuery where the intent is:

   SpanQuery X must be in the first N positions
      AND
   SpanQuery Z must not come before SpanQuery X

The best i can figure would be something like...

   SpanQuery q = new SpanNotQuery(
       new SpanFirstQuery(X, N),
       new SpanFirstQuery(new SpanNearQuery(new SpanQuery[]{Z, X.clone()},
                                            N-1, true),
                          N-1);

...but this doesn't seem to work at the moment because of LUCENE-560.
Even if it did work, it seems kind of kludgy, anyone have a better
suggestion?


: I'm looking at SpanQueries as I work on new test cases for LUCENE-557, and
: I'm confused by the implimentation of SpanFirstQuery.getSpans().
:
: In the Anonymous Spans instance returned, start() and end() are allways
: the start() and end() of the inner SpanQuery for the current doc --
: shouldn't the start() be "0" indicating that the span starts at the
: begining of the document?
:
: Because of the current implimentation, there doesn't seem to be any way to
: use SpanNot with SpanFirst to ensure that a given SpanQuery X allways
: occurs in the first N positions, and that SpanQuery Z does not appear
: before it.
:
: In other words, this...
:    int N = ...
:    SpanQuery X =...;
:    SpanQuery Z =...;
:    SpanQuery first = new SpanFirstQuery(X,N);
:    SpanQuery q = new SpanNotQuery(first, Z);
:
: ...seems to be functionally equivilent to...
:    int N = ...
:    SpanQuery X =...;
:    //SpanQuery Z =...;
:    SpanQuery first = new SpanFirstQuery(X,N);
:    SpanQuery q = first;
:
: ...is that how it's suppose to work, or am i missunderstanding something?
:
: Here's a small addition to "TestBasics" which shows how i was
: expecting things to work...
:
:   public void testSpanNotFirst() throws Exception {
:     SpanTermQuery term40 = new SpanTermQuery(new Term("field", "forty"));
:     SpanFirstQuery first = new SpanFirstQuery(term40, 5);
:     SpanTermQuery hun = new SpanTermQuery(new Term("field", "hundred"));
:     SpanNotQuery not = new SpanNotQuery(first, hun);
:
:     checkHits(not, new int[]{40,41,42,43,44,45,46,47,48,49});
:
:   }
:
:
: ...but the results contain every number with a 4 in the tens digit



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]