Solr v4.2.1: fields without associated documents

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr v4.2.1: fields without associated documents

Bridger Dyson-Smith
Hi all -

I'm working with an application that uses Solr v4.2.1 and I'm seeing a
strange issue with our index: I have many fields (10s of thousands) of
fields that don't seem to have any associated document. I was wondering if
there was any way of getting them out of our index (or out of our
admin/luke endpoint, maybe more specifically).

A very helpful person on IRC suggested that the only way to get rid of
these might be a clean rebuild of the index, and that's not out of the
question for us; I hoped to get a bit more information here.

The fields appear in /solr/admin/luke:
<lst name="fedora_datastream_latest_hesler_200_0006_MIMETYPE_ms">
  <str name="type">string</str>
  <str name="schema">I-S-M---OF-----l</str>
  <str name="dynamicBase">*_ms</str>
</lst>

but querying for them, using something like
`fq=fedora_datastream_latest_hesler_200_0006_MIMETYPE_ms:[* TO *]` doesn't
return any documents, and when using the admin UI's Schema Browser there
isn't any corresponding 'Index' section (only 'Schema').

We don't have these fields statically assigned in our schema.

Other than a clean reindexing of our data, is there anything we can do to
clean these up?
Thanks in advance for your help!

Best,
Bridger
Reply | Threaded
Open this post in threaded view
|

Re: Solr v4.2.1: fields without associated documents

Shawn Heisey-2
On 10/29/2019 2:25 PM, Bridger Dyson-Smith wrote:
> A very helpful person on IRC suggested that the only way to get rid of
> these might be a clean rebuild of the index, and that's not out of the
> question for us; I hoped to get a bit more information here.

I'm the one who you talked to on IRC.

> Other than a clean reindexing of our data, is there anything we can do to
> clean these up?
> Thanks in advance for your help!

You should wait for confirmation, but I am not aware of any other way to
fix this.  The optimize operation (that I was hopeful would take care of
it) is a purely Lucene operation that knows nothing at all about Solr.
I learned that the optimize operation preserves all field metadata built
into the index, even if the field was only referenced by deleted
documents.  Discussing the issue with other committers in our slack
channel has revealed that it might be extremely difficult or impossible
to improve the optimize operation so it purges unused metadata.  I can
ask on our dev list to see what I can learn.

I personally feel that Solr users should always be prepared to
completely rebuild indexes from scratch.  As painful as that prospect
might be, it is the only solution to a number of problems, and is also
frequently required by many configuration changes.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Solr v4.2.1: fields without associated documents

Shawn Heisey-2
On 10/29/2019 4:05 PM, Shawn Heisey wrote:
> I can
> ask on our dev list to see what I can learn.

I should add something important to this.  Even if we can implement an
enhancement, it would only be added to an 8.x version at the earliest.
It is not possible to take an index from 4.2.1 and use it in Solr 8.x,
so you'd have to rebuild your index anyway even if you upgraded to get
the new feature.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Solr v4.2.1: fields without associated documents

Bridger Dyson-Smith
Hi Shawn -

Thanks again for your help on IRC -- I took your suggestions and info,
talked it over with my colleagues, and we decided that we'll rebuild our
index -- all before I had finished composing my original email to the list.

On Tue, Oct 29, 2019 at 6:17 PM Shawn Heisey <[hidden email]> wrote:

> On 10/29/2019 4:05 PM, Shawn Heisey wrote:
> > I can
> > ask on our dev list to see what I can learn.
>
> I should add something important to this.  Even if we can implement an
> enhancement, it would only be added to an 8.x version at the earliest.
> It is not possible to take an index from 4.2.1 and use it in Solr 8.x,
> so you'd have to rebuild your index anyway even if you upgraded to get
> the new feature.
>
> That makes complete sense. We're hopeful that we'll be able to move to a
new version of Solr in the next year, but we don't have any expectations
for moving the index over.

> Thanks,
> Shawn
>

Thank you!
Best,
Bridger