forceMerge and unused metadata

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

forceMerge and unused metadata

Shawn Heisey-2
A question came across the #solr IRC channel, where the user was seeing
fields in their /admin/luke endpoint about a bunch of fields they used
to use, but are no longer in any current documents.  That URL endpoint
provides information about the fields in the index, getting most of that
info directly from Lucene.

I asked them to run an optimize (forceMerge in Lucene) and see what that
did.  It did not remove those fields.

Discussing it with other Solr committers on the lucene-solr slack
channel, this is apparently known -- a forceMerge does not eliminate any
field metadata, even if the field is not referenced by any non-deleted
document.

What I'm wondering is whether it would be possible to adjust merging so
that it can determine what pieces of metadata (like field information)
are unused in the index and remove them.  It would be fine if this were
only an option on forceMerge, but nice if it were something that could
happen on any merge.  That discussion on slack indicated that it might
be prohibitively expensive to do this.  Can one of our experts on Lucene
merging respond?

This particular user has no option that I am aware of other than to
rebuild their index.  They're running version 4.2.1.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: forceMerge and unused metadata

david.w.smiley@gmail.com
To follow-up in a more official channel than Slack, I suggested that the JIRA issue for this request is: https://issues.apache.org/jira/browse/LUCENE-8551 

~ David Smiley
Apache Lucene/Solr Search Developer


On Tue, Oct 29, 2019 at 6:19 PM Shawn Heisey <[hidden email]> wrote:
A question came across the #solr IRC channel, where the user was seeing
fields in their /admin/luke endpoint about a bunch of fields they used
to use, but are no longer in any current documents.  That URL endpoint
provides information about the fields in the index, getting most of that
info directly from Lucene.

I asked them to run an optimize (forceMerge in Lucene) and see what that
did.  It did not remove those fields.

Discussing it with other Solr committers on the lucene-solr slack
channel, this is apparently known -- a forceMerge does not eliminate any
field metadata, even if the field is not referenced by any non-deleted
document.

What I'm wondering is whether it would be possible to adjust merging so
that it can determine what pieces of metadata (like field information)
are unused in the index and remove them.  It would be fine if this were
only an option on forceMerge, but nice if it were something that could
happen on any merge.  That discussion on slack indicated that it might
be prohibitively expensive to do this.  Can one of our experts on Lucene
merging respond?

This particular user has no option that I am aware of other than to
rebuild their index.  They're running version 4.2.1.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]