Ranking

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Ranking

Steven White
Hi everyone,

I have 2 files like so:

FA has the letter "i" only 2 times, and the file size is 54,246 bytes
FB has the letter "i" 362 times and the file size is 9,953

When I search on the letter "i" FB is ranked lower which confuses me
because I was under the impression the occurrences of the term in a
document and the document size is a factor as such I was expecting FB to
rank higher.  Did I get this right?  If not, what's causing FB to rank
lower?

I'm on Solr 8.1

Thanks

Steven
Reply | Threaded
Open this post in threaded view
|

Re: Ranking

dhastings
I can’t imagine this is actually true unless you have a default copy field and I is in one of them. Also the letter “I” is a bizarre test case

> On Jul 27, 2019, at 3:40 PM, Steven White <[hidden email]> wrote:
>
> Hi everyone,
>
> I have 2 files like so:
>
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
>
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher.  Did I get this right?  If not, what's causing FB to rank
> lower?
>
> I'm on Solr 8.1
>
> Thanks
>
> Steven
Reply | Threaded
Open this post in threaded view
|

Re: Ranking

Erik Hatcher-4
In reply to this post by Steven White
The details of the scoring can be seen by setting &debug=true

    Erik

> On Jul 27, 2019, at 15:40, Steven White <[hidden email]> wrote:
>
> Hi everyone,
>
> I have 2 files like so:
>
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
>
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher.  Did I get this right?  If not, what's causing FB to rank
> lower?
>
> I'm on Solr 8.1
>
> Thanks
>
> Steven
Reply | Threaded
Open this post in threaded view
|

Re: Ranking

Charlie Hull-3
There are also various tools including a Chrome plugin and (my own
employer's) www.splainer.io that make the debug info a little easier to
read and understand.

Cheers

Charlie

On 27/07/2019 21:55, Erik Hatcher wrote:

> The details of the scoring can be seen by setting &debug=true
>
>      Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <[hidden email]> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher.  Did I get this right?  If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven
> Delivered-To: [hidden email]
> Received: by 2002:a17:906:2458:0:0:0:0 with SMTP id a24csp1586014ejb;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> X-Google-Smtp-Source: APXvYqyOJGFc4Jfb6hSGC2motoP0si1xBGLcaJJA51C4gS6Zvj3RhV87HVLng5R2Y5xLRevmVPEd
> X-Received: by 2002:a17:906:6986:: with SMTP id i6mr77368759ejr.89.1564260974407;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> ARC-Seal: i=1; a=rsa-sha256; t=1564260974; cv=none;
>          d=google.com; s=arc-20160816;
>          b=zN1DpMaPqAdm/h1qacUMD1I+QZIKptGL+PnvQz4ljHII0QwZa7Gx1TNvxaq+0nw4D4
>           drx9a7vt/UkqHCt2wtOTUMc1urSZ4E1nQJ+dbdvHg7xjy2huamH9k+9zBI1kepKvfcWx
>           YmlAS3JrTqrUmwrWxZ+CkOo3OQcZZmTMBD4DnYdFaPb3X+sMdsEBAIpsJwcnrNCtju5Q
>           b4ggGFIqHpW59puiTLwH2M8CJd4PQQ7V7nAgjRZM1Oe5heOmB+V4XxCxu7heEmPfqrEO
>           h+N5NKVKTh6E/8tIeySxUGbWrJjrRkd5u1XyLLeVIyRf4GTBqCjSO9IvXaEsyDBRmrXg
>           tEfw==
> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
>          h=to:in-reply-to:references:message-id:subject:date:mime-version
>           :content-transfer-encoding:from:dkim-signature:delivered-to:reply-to
>           :list-id:list-post:list-unsubscribe:list-help:precedence
>           :mailing-list;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=saPnNho+dSivQ3e9Uu0lBLqYmPH2lyw1eGMFpPviInXy3sLb2Y3y1APtkoCXP9QMuE
>           JjUgYYsGqQDUTq7vTbmw+E2KcT24hIlAhPUULs7Qjvw6SVOPDph4JnwgSmtkSp6aqnuz
>           Ta1s/VuJMK26hay09FT84OweEcouXXz990wsidhx1upOLl1SFdeRK7OAVAKGtmsdGkC3
>           rP85w63QI30Y6gLZ4yBfMSnFX3x9ziUNtET0UrUe4GoKCxLlBjt3C8PI0dEb3IZvhPd0
>           oNcXWEpGI/zCi/8LB1dobg/7RIu52ZIU/1vk5m/DlSUMInDlhoQWU2pyMkYkIWsyw5NW
>           c6Jw==
> ARC-Authentication-Results: i=1; mx.google.com;
>         dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
>         spf=pass (google.com: domain of solr-user-return-148978-charlie=[hidden email] designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=[hidden email]";
>         dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Return-Path: <solr-user-return-148978-charlie=[hidden email]>
> Received: from mail.apache.org (hermes.apache.org. [207.244.88.153])
>          by mx.google.com with SMTP id g28si14813479edc.275.2019.07.27.13.56.13
>          for <[hidden email]>;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> Received-SPF: pass (google.com: domain of solr-user-return-148978-charlie=[hidden email] designates 207.244.88.153 as permitted sender) client-ip=207.244.88.153;
> Authentication-Results: mx.google.com;
>         dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
>         spf=pass (google.com: domain of solr-user-return-148978-charlie=[hidden email] designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=[hidden email]";
>         dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Received: (qmail 76698 invoked by uid 500); 27 Jul 2019 20:56:05 -0000
> Mailing-List: contact [hidden email]; run by ezmlm
> Precedence: bulk
> List-Help: <mailto:[hidden email]>
> List-Unsubscribe: <mailto:[hidden email]>
> List-Post: <mailto:[hidden email]>
> List-Id: <solr-user.lucene.apache.org>
> Reply-To: [hidden email]
> Delivered-To: mailing list [hidden email]
> Received: (qmail 76679 invoked by uid 99); 27 Jul 2019 20:56:02 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
>      by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jul 2019 20:56:02 +0000
> Received: from localhost (localhost [127.0.0.1])
> by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6F826180C7B
> for <[hidden email]>; Sat, 27 Jul 2019 20:56:01 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: -0.2
> X-Spam-Level:
> X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31
> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
> SPF_PASS=-0.001] autolearn=disabled
> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
> dkim=pass (2048-bit key) header.d=gmail.com
> Received: from mx1-ec2-va.apache.org ([10.40.0.8])
> by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
> with ESMTP id tkntRGqBd7lZ for <[hidden email]>;
> Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.175; helo=mail-qk1-f175.google.com; envelope-from=[hidden email]; receiver=<UNKNOWN>
> Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175])
> by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 261BCBC7B3
> for <[hidden email]>; Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received: by mail-qk1-f175.google.com with SMTP id d15so41571526qkl.4
>          for <[hidden email]>; Sat, 27 Jul 2019 13:55:59 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>          d=gmail.com; s=20161025;
>          h=from:content-transfer-encoding:mime-version:date:subject:message-id
>           :references:in-reply-to:to;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=ZC6lb5CmIWySYfPuspRyKS8kpKRIgrHEALHWqB+cXPH187pmfYwKnSr1LIMNGiJso5
>           PBWWaIV8Rdt1rCOEiIZk6hWbC9xEsiSiAYuirIpJMAKsjigJXr+ua25jQDKB5EL/DIJ9
>           7Ygo2v5BzEmGb6h3Fxvmq71HEkwuOd5+Vi+6OoZdpkiuseD+pfEVUCp0FC0uAoP7wJKA
>           J/Z9xJvU4m0kCvIo9ofeNNCv/nmMBjBUjZOvA6EUOfKPuBf0HOT6rW1K5gUenabNTc3Y
>           hgqN3i5d8mRfM531Ts0/s90EbSrN+yKLnXsi5J7Y+ZGJzLgybGajBuJpGUy8zSxaq138
>           a7Mw==
> X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>          d=1e100.net; s=20161025;
>          h=x-gm-message-state:from:content-transfer-encoding:mime-version:date
>           :subject:message-id:references:in-reply-to:to;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=TQjzBgBLERdlcF7x7vkFeoWbONWInnLJTGH5xre4s0oCCMzTrqF3s3Fh6z8unQrOz4
>           6WY0czoSp83jXHH4mQqoERTz1gaIXZZguzwNBPWe8t76Qf+GCpXCsxU6ZLG6Cn/qydup
>           JcjcqeERlOMRySbUA17L7cDrUXWGh7x14KkdJqSByrXqatT00astGrTJswcmEfxiULTd
>           cFMja9+dBSEGradQMPQfkvKB3rizOjauXO13LojKmXpfrX3h5oSXPk1QdscVDBzMDBkd
>           rpUgMBLWVo/PgJ269AfhfAkr0sNeWfk0Vm+IOmLRokJ2OrOYoRR9i16uH1+r/GRxSqrY
>           Prhg==
> X-Gm-Message-State: APjAAAWgIU3qTtZge+065LST9X7uBq4HN90TvcjzsAQas1RpKTe48fSP
> AmBL+r3+kuch3DEuvd7/tbw/1siqIXo=
> X-Received: by 2002:a37:4e92:: with SMTP id c140mr62121531qkb.48.1564260952874;
>          Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> Received: from [192.168.0.102] ([71.51.161.116])
>          by smtp.gmail.com with ESMTPSA id r26sm24358675qkm.57.2019.07.27.13.55.52
>          for <[hidden email]>
>          (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
>          Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> From: Erik Hatcher <[hidden email]>
> Content-Type: text/plain;
> charset=us-ascii
> Content-Transfer-Encoding: 7bit
> Mime-Version: 1.0 (1.0)
> Date: Sat, 27 Jul 2019 16:55:51 -0400
> Subject: Re: Ranking
> Message-Id: <[hidden email]>
> References: <[hidden email]>
> In-Reply-To: <[hidden email]>
> To: [hidden email]
> X-Mailer: iPhone Mail (16F203)
>
> The details of the scoring can be seen by setting &debug=true
>
>      Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <[hidden email]> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher.  Did I get this right?  If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk