Documents with same unique id indexed multiple times

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Documents with same unique id indexed multiple times

JTytler
I am seeing multiple entries for the exact same URLs and each time I re-index
the contents the crawler adds the same entry one more time.

I have <uniqueKey>id</uniqueKey> defined in the schema.xml file.  It seems
Solr is not reading the schema file - I cannot think of any other
explanation how the unique key constraint could be violated.

Please see the examples below:

"_language":"en",
"_description":"",
"_last_modified":"2018-11-22T00:00:00Z",
"_platform":"infosite",
"_keywords":"",
"id":"http://infosite.gc.ca/org/org/psa-af_e.aspx",
"_title":"Office of Public Service Accessibility",
"_type":"Administrative page",
"Date":["Tue, 30 Jul 2019 19:56:28 GMT"],
"url":"http://infosite.gc.ca/org/org/psa-af_e.aspx",
"content":["Engage expert internal and external stakeholders to provide
advice, support and motivation to facilitate progress. Date Modified:
2018-11-22"],
"version":1640514868282916864,
"root":["http://infosite.gc.ca/org/org/psa-af_e.aspx"],
"timestamp":"2019-07-30T20:01:35.564Z"},
{
"_language":"en",
"_description":"",
"_last_modified":"2018-11-22T00:00:00Z",
"_platform":"infosite",
"_keywords":"",
"id":"http://infosite.gc.ca/org/org/psa-af_e.aspx",
"_title":"Office of Public Service Accessibility",
"_type":"Administrative page",
"Date":["Tue, 30 Jul 2019 22:12:19 GMT"],
"url":"http://infosite.gc.ca/org/org/psa-af_e.aspx",
"content":[Engage expert internal and external stakeholders to provide
advice, support and motivation to facilitate progress. Date Modified:
2018-11-22"],
"version":1640523480504991744,
"root":["http://infosite.gc.ca/org/org/psa-af_e.aspx"],
"timestamp":"2019-07-30T22:18:28.819Z"}]
}}



I am also occassionally seeing the write.lock error message and while trying
to index sometimes Norconex Committer is unable to write to the /data/index
folder, giving error about not able to write to the files in the folder.
This all started happenning since past few weeks and I wonder if all these
issues are related and
whether the permissions of the Solr Service account got changed that may
have caused these issues to all of sudden start to appear.

Has anyone seen this bizzare behaviour?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html