Solr still gives old data while faceting (from the deleted/updated documents)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr still gives old data while faceting (from the deleted/updated documents)

girish.vignesh
This post was updated on .
Solr gives old data while faceting from old deleted or updated documents.

For example we are doing faceting on name. name changes frequently for our
application. When we index the document after changing the name we get both
old name and new name in the search results. After digging more on this I
got to know that Solr indexes are composed of segments (write once) and each
segment contains set of documents. Whenever hard commit happens these
segments will be closed and even if a document is deleted after that it will
still have those documents (which will be marked as deleted). These
documents will not be cleared immediately. It will not be displayed in the
search result though, but somehow faceting is still able to access those
data.

Optimizing fixed this issue. But we cannot perform this each time customer
changes data on production. I tried below options...

1) *expungeDeletes*.

Added this line below in solrconfig.xml

<autoCommit>
  <maxTime>30000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
  <maxTime>10000</maxTime>
</autoSoftCommit>

<commit waitSearcher="false" expungeDeletes="true"/>  // This is not
working.

I do not think I can add expungeDeletes configuration like this. When I make
expungeDeletes call using curl command through /update URL its merging the segments and this fixes the issue.

2) Using *TieredMergePolicyFactory* might not help me as the threshold might
not reach always and user will see old data during this time.

3) One more way of doing it is calling *optimize*() method which is exposed
in solrj daily once. But not sure what impact this will have on performance.

4) Tried manipulating filterCache, documentCache and queryResultCache. I do
not think whatever the issue I am facing is because of these caches.

Number of documents we index per server will be maximum 2M-3M.

Please suggest if there is any solution to this.

Let me know if more data needed.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr still gives old data while faceting from the deleted documents

Shawn Heisey
On 4/12/2018 5:53 AM, girish.vignesh wrote:

> Solr gives old data while faceting from old deleted or updated documents.
>
> For example we are doing faceting on name. name changes frequently for our
> application. When we index the document after changing the name we get both
> old name and new name in the search results. After digging more on this I
> got to know that Solr indexes are composed of segments (write once) and each
> segment contains set of documents. Whenever hard commit happens these
> segments will be closed and even if a document is deleted after that it will
> still have those documents (which will be marked as deleted). These
> documents will not be cleared immediately. It will not be displayed in the
> search result though, but somehow faceting is still able to access those
> data.

If all documents with that term are deleted, then this will be fixed by
adding a facet.mincount=1 parameter to your facet URL.  If you are using
the JSON facet API, then there is a mincount parameter that you can
place into your JSON request. I've never actually used the JSON facet
API, but there is documentation:

https://lucene.apache.org/solr/guide/7_2/json-facet-api.html#TermsFacet

The mincount parameter might make it unnecessary to optimize.  But if
you are updating a LOT of your documents on a regular basis, you might
find that it gives you better performance, so optimizing once a day
during a time when traffic is low might be useful.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Solr still gives old data while faceting from the deleted documents

girish.vignesh
mincount will fix this issue for sure. I have tried that but the requirement
is to show facets with 0 count as disabled.

I think I left with only 2 options. Either go with expungeDelets with update
URL or use optimize in a scheduler.

Regards,
Vignesh



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr still gives old data while faceting from the deleted documents

Erick Erickson
expungeDeletes wont' do the trick for you, it purges documents in
segments with > 10% deleted docs so you'll still have documents.

I'd push back on "the requirement is to show facets with 0 count as
disabled." Why? What use-case is satisfied here? Effectively this is
saying "For my query, show me possible values that have no hits for
that query". Optimize is a very costly operation and to really get
this behavior you'll need to run it _every_ time the index changes.
You really can't afford to run it for every update, so there'll be a
period of time when you will still get these facets.

Eventually you won't be displaying zero-count facets anyway, assuming
that you have room for, say, only 10 facets and sort by frequency.

If your index changes only periodically (say once a day) that may be
fine. But more often than that and you won't be able to satisfy the
requirement anyway.

My point is that requirements like this are often created without
understanding the consequences and cause a lot of effort to be
expended without a good purpose. See:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Thu, Apr 12, 2018 at 10:32 PM, girish.vignesh
<[hidden email]> wrote:

> mincount will fix this issue for sure. I have tried that but the requirement
> is to show facets with 0 count as disabled.
>
> I think I left with only 2 options. Either go with expungeDelets with update
> URL or use optimize in a scheduler.
>
> Regards,
> Vignesh
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html