Get matching fields from a BooleanQuery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Get matching fields from a BooleanQuery

Frederik Van Hoyweghen
Hey everyone,

To start, we are using Lucene 4.3.

To search, we prepare several queries and combine these into a BooleanQuery.
What we are looking for is a way to determine on which specific fields a
certain document matched.
For example, I create 2 queries: one to search in the "Name" field, and
another to search in the "Description" field.

Combining these into a BooleanQuery and running it will return the matching
documents,
but we'd like to know for each document returned whether there was a match
in the Name field or in the Description field.

It seems to me that something like the highlighter would need to know this
too but highlighting isn't a goal currently. I've also looked at
indexsearcher.explain() but the doc says that this is as expensive as
running the query against the entire index, so I'd obviously like to avoid
running the same queries mutliple times :).

Kind regards,
Frederik
Reply | Threaded
Open this post in threaded view
|

Re: Get matching fields from a BooleanQuery

Adrien Grand
Hi Frederik,

Using explain should be fine for that use-case since you will only apply it
to the top hits. Otherwise you could use the low-level search APIs in order
to do this. It would look something like that if you want to find which
query among `queries` matches document `docID` (I did not check it compiles
but it should give the idea):

List<Query> queries;
int docID;
IndexSearcher searcher;
int leafIndex = ReaderUtil.subIndex(docID,
searcher.getIndexReader().leaves());
LeafReaderContext leaf = searcher.getIndexReader().leaves().get(leafIndex);
int leafDocID = docID - leaf.docBase;
for (Query query : queries) {
  Weight weight = searcher.createNormalizedWeight(query);
  Scorer scorer = weight.scorer(leaf);
  boolean matches = scorer.advance(leafDocID) == leafDocID;
}

Le lun. 19 juin 2017 à 11:24, Frederik Van Hoyweghen <
[hidden email]> a écrit :

> Hey everyone,
>
> To start, we are using Lucene 4.3.
>
> To search, we prepare several queries and combine these into a
> BooleanQuery.
> What we are looking for is a way to determine on which specific fields a
> certain document matched.
> For example, I create 2 queries: one to search in the "Name" field, and
> another to search in the "Description" field.
>
> Combining these into a BooleanQuery and running it will return the matching
> documents,
> but we'd like to know for each document returned whether there was a match
> in the Name field or in the Description field.
>
> It seems to me that something like the highlighter would need to know this
> too but highlighting isn't a goal currently. I've also looked at
> indexsearcher.explain() but the doc says that this is as expensive as
> running the query against the entire index, so I'd obviously like to avoid
> running the same queries mutliple times :).
>
> Kind regards,
> Frederik
>
Reply | Threaded
Open this post in threaded view
|

RE: Get matching fields from a BooleanQuery

Ranganath B N
Hi Adrien,

   Using Explanation object, how do you get know which are the matching fields in a document for a query. In my case, I tried DisjunctionMaxQuery with tiebreaking matcher as
0.01f. what is the meaning of this argument?. After searching with the DisjunctionMaxQuery,
I tried explain method of the searcher against this query and a hit document. I used
getdescription()  and toString() method but got some output as

"
explan getdesc=max plus 0.01 times others of:
explan tostring=8.773912 = max plus 0.01 times others of:
  8.773912 = sum of:
    8.773863 = weight(text2:abababababa11#abc10#abc16#a16#2 in 1066) [BM25Similarity], result of:
      8.773863 = score(doc=1066,freq=1.0 = termFreq=1.0
), product of:
        8.8049755 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
          1.0 = docFreq
          10000.0 = docCount
        0.99646646 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          28.2 = avgFieldLength
          28.444445 = fieldLength
    4.9819588E-5 = weight(text2:regular in 1066) [BM25Similarity], result of:
      4.9819588E-5 = score(doc=1066,freq=1.0 = termFreq=1.0
), product of:
        4.999625E-5 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
          10000.0 = docFreq
          10000.0 = docCount
        0.99646646 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
          1.0 = termFreq=1.0
          1.2 = parameter k1
          0.75 = parameter b
          28.2 = avgFieldLength
          28.444445 = fieldLength"





How to know the matching fields from an Explanation Object?


Regards,
Ranganath B. N.


-----Original Message-----
From: Adrien Grand [mailto:[hidden email]]
Sent: Thursday, June 22, 2017 12:55 PM
To: [hidden email]
Subject: Re: Get matching fields from a BooleanQuery

Hi Frederik,

Using explain should be fine for that use-case since you will only apply it to the top hits. Otherwise you could use the low-level search APIs in order to do this. It would look something like that if you want to find which query among `queries` matches document `docID` (I did not check it compiles but it should give the idea):

List<Query> queries;
int docID;
IndexSearcher searcher;
int leafIndex = ReaderUtil.subIndex(docID, searcher.getIndexReader().leaves());
LeafReaderContext leaf = searcher.getIndexReader().leaves().get(leafIndex);
int leafDocID = docID - leaf.docBase;
for (Query query : queries) {
  Weight weight = searcher.createNormalizedWeight(query);
  Scorer scorer = weight.scorer(leaf);
  boolean matches = scorer.advance(leafDocID) == leafDocID; }

Le lun. 19 juin 2017 à 11:24, Frederik Van Hoyweghen < [hidden email]> a écrit :

> Hey everyone,
>
> To start, we are using Lucene 4.3.
>
> To search, we prepare several queries and combine these into a
> BooleanQuery.
> What we are looking for is a way to determine on which specific fields
> a certain document matched.
> For example, I create 2 queries: one to search in the "Name" field,
> and another to search in the "Description" field.
>
> Combining these into a BooleanQuery and running it will return the
> matching documents, but we'd like to know for each document returned
> whether there was a match in the Name field or in the Description
> field.
>
> It seems to me that something like the highlighter would need to know
> this too but highlighting isn't a goal currently. I've also looked at
> indexsearcher.explain() but the doc says that this is as expensive as
> running the query against the entire index, so I'd obviously like to
> avoid running the same queries mutliple times :).
>
> Kind regards,
> Frederik
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Get matching fields from a BooleanQuery

Frederik Van Hoyweghen
Thanks for the tip, Adrien!

Ranganath,
you can call isMatch() on your explanation. That way you know your query
(for a specific field) matched a certain document.

Query query = // create your query as you normally would, for 1 specific
field
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
    //Your query matched on the field
}

Note that docID in the example is the -lucene- docID, so you'll have to get
your searchresults first.

Regards,
Frederik

On Thu, Jun 22, 2017 at 2:37 PM, Ranganath B N <[hidden email]>
wrote:

> Hi Adrien,
>
>    Using Explanation object, how do you get know which are the matching
> fields in a document for a query. In my case, I tried DisjunctionMaxQuery
> with tiebreaking matcher as
> 0.01f. what is the meaning of this argument?. After searching with the
> DisjunctionMaxQuery,
> I tried explain method of the searcher against this query and a hit
> document. I used
> getdescription()  and toString() method but got some output as
>
> "
> explan getdesc=max plus 0.01 times others of:
> explan tostring=8.773912 = max plus 0.01 times others of:
>   8.773912 = sum of:
>     8.773863 = weight(text2:abababababa11#abc10#abc16#a16#2 in 1066)
> [BM25Similarity], result of:
>       8.773863 = score(doc=1066,freq=1.0 = termFreq=1.0
> ), product of:
>         8.8049755 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>           1.0 = docFreq
>           10000.0 = docCount
>         0.99646646 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 *
> (1 - b + b * fieldLength / avgFieldLength)) from:
>           1.0 = termFreq=1.0
>           1.2 = parameter k1
>           0.75 = parameter b
>           28.2 = avgFieldLength
>           28.444445 = fieldLength
>     4.9819588E-5 = weight(text2:regular in 1066) [BM25Similarity], result
> of:
>       4.9819588E-5 = score(doc=1066,freq=1.0 = termFreq=1.0
> ), product of:
>         4.999625E-5 = idf, computed as log(1 + (docCount - docFreq + 0.5)
> / (docFreq + 0.5)) from:
>           10000.0 = docFreq
>           10000.0 = docCount
>         0.99646646 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 *
> (1 - b + b * fieldLength / avgFieldLength)) from:
>           1.0 = termFreq=1.0
>           1.2 = parameter k1
>           0.75 = parameter b
>           28.2 = avgFieldLength
>           28.444445 = fieldLength"
>
>
>
>
>
> How to know the matching fields from an Explanation Object?
>
>
> Regards,
> Ranganath B. N.
>
>
> -----Original Message-----
> From: Adrien Grand [mailto:[hidden email]]
> Sent: Thursday, June 22, 2017 12:55 PM
> To: [hidden email]
> Subject: Re: Get matching fields from a BooleanQuery
>
> Hi Frederik,
>
> Using explain should be fine for that use-case since you will only apply
> it to the top hits. Otherwise you could use the low-level search APIs in
> order to do this. It would look something like that if you want to find
> which query among `queries` matches document `docID` (I did not check it
> compiles but it should give the idea):
>
> List<Query> queries;
> int docID;
> IndexSearcher searcher;
> int leafIndex = ReaderUtil.subIndex(docID, searcher.getIndexReader().
> leaves());
> LeafReaderContext leaf = searcher.getIndexReader().
> leaves().get(leafIndex);
> int leafDocID = docID - leaf.docBase;
> for (Query query : queries) {
>   Weight weight = searcher.createNormalizedWeight(query);
>   Scorer scorer = weight.scorer(leaf);
>   boolean matches = scorer.advance(leafDocID) == leafDocID; }
>
> Le lun. 19 juin 2017 à 11:24, Frederik Van Hoyweghen <
> [hidden email]> a écrit :
>
> > Hey everyone,
> >
> > To start, we are using Lucene 4.3.
> >
> > To search, we prepare several queries and combine these into a
> > BooleanQuery.
> > What we are looking for is a way to determine on which specific fields
> > a certain document matched.
> > For example, I create 2 queries: one to search in the "Name" field,
> > and another to search in the "Description" field.
> >
> > Combining these into a BooleanQuery and running it will return the
> > matching documents, but we'd like to know for each document returned
> > whether there was a match in the Name field or in the Description
> > field.
> >
> > It seems to me that something like the highlighter would need to know
> > this too but highlighting isn't a goal currently. I've also looked at
> > indexsearcher.explain() but the doc says that this is as expensive as
> > running the query against the entire index, so I'd obviously like to
> > avoid running the same queries mutliple times :).
> >
> > Kind regards,
> > Frederik
> >
>