What is the "docs" number in Solr explain query results for fieldnorm?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the "docs" number in Solr explain query results for fieldnorm?

Tom Burton-West-2
Hello all,

I am trying to understand the output of Solr explain for a one word query.
I am querying on the "ocr" field with no stemming/synonyms or stopwords.
And no query or index time boosting.

The query is "ocr:the"

The document (result below)  which contains two words "The Aeroplane" gets
more hits than documents with 50 or more occurances of the word "the"
Since the idf is the same I am assuming this is a result of length norms.

The explain (debugQuery) shows the following for fieldnorm:
 0.625 = fieldNorm(field=ocr, doc=16624)
What does the "doc=16624" mean?  It certainly can not represent either the
length of the field (as an integer) since there are only two terms in the
field.
It can't represent the number of docs with the query term (the idf output
shows the word "the" occurs in 16,219 docs.

I have appended below the explain scoring for a couple of documents with tf
50 and 67.


<float name="score">0.6798219</float>
    <str name="ID">DF9199B7049F8DFE-220</str>
    <str name="doc_ID">DF9199B7049F8DFE</str>
    <str name="ocr">The Aeroplane
</str>
<str name="DF9199B7049F8DFE-220">
0.6798219 = (MATCH) fieldWeight(ocr:the in 16624), product of:
  1.0 = tf(termFreq(ocr:the)=1)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.625 = fieldNorm(field=ocr, doc=16624)
</str>

Tom Burton-West

-----

    <str name="78562575E066497D-518">
0.42061833 = (MATCH) fieldWeight(ocr:the in 8396), product of:
  7.071068 = tf(termFreq(ocr:the)=50)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.0546875 = fieldNorm(field=ocr, doc=8396)
</str>



 <str name="18881D8AE8B1576E-120">

0.41734362 = (MATCH) fieldWeight(ocr:the in 2782), product of:
  8.185352 = tf(termFreq(ocr:the)=67)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.046875 = fieldNorm(field=ocr, doc=2782)
</str>
Reply | Threaded
Open this post in threaded view
|

Re: What is the "docs" number in Solr explain query results for fieldnorm?

Andrzej Białecki-2
On 25/05/2012 20:13, Tom Burton-West wrote:

> Hello all,
>
> I am trying to understand the output of Solr explain for a one word query.
> I am querying on the "ocr" field with no stemming/synonyms or stopwords.
> And no query or index time boosting.
>
> The query is "ocr:the"
>
> The document (result below)  which contains two words "The Aeroplane" gets
> more hits than documents with 50 or more occurances of the word "the"
> Since the idf is the same I am assuming this is a result of length norms.
>
> The explain (debugQuery) shows the following for fieldnorm:
>   0.625 = fieldNorm(field=ocr, doc=16624)
> What does the "doc=16624" mean?  It certainly can not represent either the
> length of the field (as an integer) since there are only two terms in the
> field.
> It can't represent the number of docs with the query term (the idf output
> shows the word "the" occurs in 16,219 docs.

Hi Tom,

This is an internal document number within a Lucene index. This number
is useless from the level of Solr APIs because you can't use it to
actually do anything. At the Lucene level (e.g. in Luke) you could
navigate to this number and for example retrieve stored fields of this
document.

As it's shown in the Explanation-s, it can be only used to co-ordinate
parts of the query that matched the same document number.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: What is the "docs" number in Solr explain query results for fieldnorm?

Yonik Seeley-2-2
In reply to this post by Tom Burton-West-2
On Fri, May 25, 2012 at 2:13 PM, Tom Burton-West <[hidden email]> wrote:
> The explain (debugQuery) shows the following for fieldnorm:
>  0.625 = fieldNorm(field=ocr, doc=16624)
> What does the "doc=16624" mean?

It's the internal document id (i.e. it's debugging info and doesn't
affect scoring)

-Yonik
http://lucidimagination.com