Running query against a single document

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Running query against a single document

Aurélien MAZOYER-2
Hi,

We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
two documents :
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
We expect to get these entries from the highlighter :
D1 eglise saint quentin
D2 <B>saint</B> <B>quentin</B> deladadoupa
But we got
eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.

Thank you for your help,

Regards,

Aurelien and Andrey
Tchiota GMBH
Reply | Threaded
Open this post in threaded view
|

Re: Running query against a single document

Tom Mortimer
Hi,

Have you considered using MemoryIndex
<https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
?

cheers,
Tom


tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer


On Fri, 21 Sep 2018 at 13:58, Aurélien MAZOYER <[hidden email]>
wrote:

> Hi,
>
> We would like to know if there is a way to test a query against a document
> without creating an index. We were thinking that maybe we could use lucene
> highlighter component
> to achieve this, but it seems it doesn't work as expected with complex
> queries.
> For example, we create a SpanQuery (+spanFirst(field:saint, 1)
> +spanNear([field:saint, field:quentin], 0, true)) and we tested it against
> two documents :
> D1={field=eglise saint quentin}
> D2={field=saint quentin deladadoupa}
> We expect to get these entries from the highlighter :
> D1 eglise saint quentin
> D2 <B>saint</B> <B>quentin</B> deladadoupa
> But we got
> eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
> perspective because it doesn't match our SpanQuery.
> Do you have any ideas if this approach is correct or if we better use some
> other way to achieve this functionality.
> FYI we use Lucene 6.5.1.
>
> Thank you for your help,
>
> Regards,
>
> Aurelien and Andrey
> Tchiota GMBH
>
Reply | Threaded
Open this post in threaded view
|

Re: Running query against a single document

Erick Erickson
bq. We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this,

I don't really understand this at all. How are you using the
highlighter component without creating an index? Custom code?

But that aside, there are dozens, if not hundreds of examples of this
in the Solr test code. You could write a Solr junit test, which
is "just some Java code" and run that.

To execute this within the test framework, you have two options:
1> from the top level "ant -Dtestcase=custom_test test", which takes a
long time to run
2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
have to have compiled your code of course for this to work.

BTW, if you skip all that and just use a Solr instance, one very
useful trick is to use &debug=true&debug.explainOther
(https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
That will show you exactly how the doc was
scored _whether or not_ it would have been returned by the primary query.

Best,
Erick
On Fri, Sep 21, 2018 at 6:16 AM Tom Mortimer <[hidden email]> wrote:

>
> Hi,
>
> Have you considered using MemoryIndex
> <https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
> ?
>
> cheers,
> Tom
>
>
> tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
>
>
> On Fri, 21 Sep 2018 at 13:58, Aurélien MAZOYER <[hidden email]>
> wrote:
>
> > Hi,
> >
> > We would like to know if there is a way to test a query against a document
> > without creating an index. We were thinking that maybe we could use lucene
> > highlighter component
> > to achieve this, but it seems it doesn't work as expected with complex
> > queries.
> > For example, we create a SpanQuery (+spanFirst(field:saint, 1)
> > +spanNear([field:saint, field:quentin], 0, true)) and we tested it against
> > two documents :
> > D1={field=eglise saint quentin}
> > D2={field=saint quentin deladadoupa}
> > We expect to get these entries from the highlighter :
> > D1 eglise saint quentin
> > D2 <B>saint</B> <B>quentin</B> deladadoupa
> > But we got
> > eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
> > perspective because it doesn't match our SpanQuery.
> > Do you have any ideas if this approach is correct or if we better use some
> > other way to achieve this functionality.
> > FYI we use Lucene 6.5.1.
> >
> > Thank you for your help,
> >
> > Regards,
> >
> > Aurelien and Andrey
> > Tchiota GMBH
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running query against a single document

Aurélien MAZOYER-2
Hi Tom and Erick,
Thank you a lot for your answers.

@Tom : Yes, we have considered MemoryIndex. But as far as I understood, we
will have to create a MemoryIndex that contains 1 single document every
time we will want to test our query against a document. I think we'll have
to perform some tests to be sure that this is efficient.
@Erick :
We use this piece of code to run the highlighter directly on a TokenStream
created from a text string (fieldTextValue) :

QueryScorer queryScorer = new QueryScorer(luceneQuery);
TokenStream stream = TokenSources.getTokenStream(fieldName, fieldTextValue,
analyzer);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
queryScorer);
TextFragment[] frag = highlighter.getBestTextFragments(stream,
fieldTextValue, true, 1000);

It seems to work pretty well for some queries, but I am afraid it works on
a kind of per-token basis and doesn't consider the context (I mean the
adjacent terms) to detect if a term is involved in the match or not.
The lucene explainer can totally address our needs, but as far as I know
it, it is not very efficient in term of performance. We will test it as
well.
We can combine Tom's suggestion about using MemoryIndex with the documents
and then run the explainer on this index.

Aurelien and Andrey
Tchiota GMBH

Le ven. 21 sept. 2018, à 16 h 57, Erick Erickson <[hidden email]>
a écrit :

> bq. We would like to know if there is a way to test a query against a
> document
> without creating an index. We were thinking that maybe we could use lucene
> highlighter component
> to achieve this,
>
> I don't really understand this at all. How are you using the
> highlighter component without creating an index? Custom code?
>
> But that aside, there are dozens, if not hundreds of examples of this
> in the Solr test code. You could write a Solr junit test, which
> is "just some Java code" and run that.
>
> To execute this within the test framework, you have two options:
> 1> from the top level "ant -Dtestcase=custom_test test", which takes a
> long time to run
> 2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
> have to have compiled your code of course for this to work.
>
> BTW, if you skip all that and just use a Solr instance, one very
> useful trick is to use &debug=true&debug.explainOther
> (https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
> That will show you exactly how the doc was
> scored _whether or not_ it would have been returned by the primary query.
>
> Best,
> Erick
> On Fri, Sep 21, 2018 at 6:16 AM Tom Mortimer <[hidden email]> wrote:
> >
> > Hi,
> >
> > Have you considered using MemoryIndex
> > <
> https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html
> >
> > ?
> >
> > cheers,
> > Tom
> >
> >
> > tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
> >
> >
> > On Fri, 21 Sep 2018 at 13:58, Aurélien MAZOYER <
> [hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > We would like to know if there is a way to test a query against a
> document
> > > without creating an index. We were thinking that maybe we could use
> lucene
> > > highlighter component
> > > to achieve this, but it seems it doesn't work as expected with complex
> > > queries.
> > > For example, we create a SpanQuery (+spanFirst(field:saint, 1)
> > > +spanNear([field:saint, field:quentin], 0, true)) and we tested it
> against
> > > two documents :
> > > D1={field=eglise saint quentin}
> > > D2={field=saint quentin deladadoupa}
> > > We expect to get these entries from the highlighter :
> > > D1 eglise saint quentin
> > > D2 <B>saint</B> <B>quentin</B> deladadoupa
> > > But we got
> > > eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
> > > perspective because it doesn't match our SpanQuery.
> > > Do you have any ideas if this approach is correct or if we better use
> some
> > > other way to achieve this functionality.
> > > FYI we use Lucene 6.5.1.
> > >
> > > Thank you for your help,
> > >
> > > Regards,
> > >
> > > Aurelien and Andrey
> > > Tchiota GMBH
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>