Solr returning same object in different page

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr returning same object in different page

ruby
I'm running into a issue where an object is appearing twice when we are
paging. My query is gives documents boost based on field values. First query
returns 50 object. Second query is exactly same as first query, except
getting next 50 objects. We are noticing that few objects which were
returned before are being returned again in the second page. Is this a known
issue with Solr?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

Shawn Heisey-2
On 9/12/2017 12:42 PM, ruby wrote:
> I'm running into a issue where an object is appearing twice when we are
> paging. My query is gives documents boost based on field values. First query
> returns 50 object. Second query is exactly same as first query, except
> getting next 50 objects. We are noticing that few objects which were
> returned before are being returned again in the second page. Is this a known
> issue with Solr?

Very likely what's happening is that the index is changing between the
query for the first page and the query for the second page.  This change
puts some new items on that first page, pushing some of the items that
used to be on page 1 to page 2.

The only absolutely certain way to prevent this would be to never make
any changes to the index.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

ruby
This post was updated on .
Hi Shawn,
No index change is happening in this case.
I have following function which boost the document based on two date fields.

{!boost+b=recip(ms(NOW,field1),3.16e-11,1,1)}{!boost+b=recip(ms(NOW,field2),3.16e-11,1,1)}

If I remove this, then I'm not seeing same objects being returned in two pages.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

Jason Gerlowski
Is it possible that your indexed data contains duplicated or
nearly-duplicated documents.  (See:
https://cwiki.apache.org/confluence/display/solr/De-Duplication)

Also, I'm curious whether you see the same duplicates when making a single,
larger query.  Can you run a single query that returns the number of
results normally found in two "pages" of results, and check whether you see
duplicates with that single query?

On Tue, Sep 12, 2017 at 3:35 PM, ruby <[hidden email]> wrote:

> Hi Shawn,
> No index change is happening in this case.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

Shawn Heisey-2
In reply to this post by ruby
On 9/12/2017 1:35 PM, ruby wrote:
> No index change is happening in this case.

The duplicate document theory that Jason mentioned is one possibility. 
If you have a uniqueKey defined in your schema, then duplicates would
need to have different uniqueKey values.  If you index a document where
the uniqueKey field has the same as an existing document, the existing
document is deleted.

If the index is not changing, and you don't have duplicate documents,
then the only way I can imagine that happening is if there are multiple
replicas of your collection with different numbers of deleted documents.

Deleted documents are still part of the index, so their contents can
affect the scores in your query results.  When different replicas have
different numbers of deleted documents, running the same query more than
once when there is load balancing can have different score-based
ordering on each of those queries.

Multiple replicas with different numbers of deletes is common with
SolrCloud, but it is also possible to have it with a non-SolrCloud
install, though it probably is less likely.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

Tom Evans
In reply to this post by ruby
On Tue, Sep 12, 2017 at 7:42 PM, ruby <[hidden email]> wrote:
> I'm running into a issue where an object is appearing twice when we are
> paging. My query is gives documents boost based on field values. First query
> returns 50 object. Second query is exactly same as first query, except
> getting next 50 objects. We are noticing that few objects which were
> returned before are being returned again in the second page. Is this a known
> issue with Solr?

Are you using paging (page=N) or deep paging (cursorMark=*)? Do you
have a deterministic sort order (IE, not simply by score)?

Cheers

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Solr returning same object in different page

alessandro.benedetti
Which version of Solr are you on?
Are you using SolrCloud or any distributed search?
In that case, I think( as already mentioned by Shawn) this could be related
[1] .

if it is just plain Solr, my shot in the dark is your boost function :

{!boost+b=recip(ms(NOW,field1),3.16e-11,1,1)}{!boost+b=recip(ms(NOW,field2),3.16e-11,1,1)}

I see you use NOW ( which changes continuosly).
it is normally suggested to round it ( for example NOW/HOUR or NOW/DAY).
The rounding granularity depends on the use case.

Time passing should not bring any change in ranking ( but it brings change
in the score).
I can imagine that if for any reason of rounding the score, we end up in
having different documents with the same score, then the internal ordinal
will be used for ranking them, bringing slightly different rankings.
This is very unlikely, but if we are using a single Solr, it's the first
thing that jumps to my mind.

[1] https://issues.apache.org/jira/browse/SOLR-5821
[2] https://github.com/fguery/lucene-solr/tree/replicaChoice




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io