Update documents cause multivalue fields unexpected behaviour

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Update documents cause multivalue fields unexpected behaviour

Jie Luo
Dear solr user,

I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine

Best Regards

Jie
Reply | Threaded
Open this post in threaded view
|

Re: Update documents cause multivalue fields unexpected behaviour

Jörn Franke
Hi,

Are you using atomic updates for your documents ? If not then if you change one value it will override the whole document.

Best regards

> Am 04.05.2019 um 12:57 schrieb Jie Luo <[hidden email]>:
>
> Dear solr user,
>
> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
>
> Best Regards
>
> Jie
Reply | Threaded
Open this post in threaded view
|

solr4 indexation taking too long time

Theodore Ngogang
 
 Dear solr user,

we are migrating from alfresco 4.2 with apache lucene to alfresco 5.0 with solr4. we are facing and issue we have 700 GB of data ans the indexation process is not finish since more than 14 days, please help us to identify problem and solve it.
thks!!
Reply | Threaded
Open this post in threaded view
|

Re: solr4 indexation taking too long time

Erick Erickson
I suggest you contact Alfresco, as few in the Solr community know enough about what Alfresco has done to be much help.

Best,
Erick

> On May 4, 2019, at 6:38 AM, Theodore Ngogang <[hidden email]> wrote:
>
>
> Dear solr user,
>
> we are migrating from alfresco 4.2 with apache lucene to alfresco 5.0 with solr4. we are facing and issue we have 700 GB of data ans the indexation process is not finish since more than 14 days, please help us to identify problem and solve it.
> thks!!

Reply | Threaded
Open this post in threaded view
|

Re: Update documents cause multivalue fields unexpected behaviour

Walter Underwood
In reply to this post by Jie Luo
We gather all the data for a document, then send it as one update to Solr.

Actually, we create a JSON object for each document, then make a JSONL (one JSON object per line) feed of everything we want to send. That gets compressed and saved in Amazon S3. Then we break it into batches and send it to Solr.

Putting the entire feed in S3 allows us to analyze that feed, load it into a test cluster, load yesterday’s feed, load it into a different prod cluster for disaster recovery, etc.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On May 4, 2019, at 3:57 AM, Jie Luo <[hidden email]> wrote:
>
> Dear solr user,
>
> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
>
> Best Regards
>
> Jie

Reply | Threaded
Open this post in threaded view
|

Re: Update documents cause multivalue fields unexpected behaviour

Jie Luo
In reply to this post by Jörn Franke
Hi all,

For the fields that are set as stored true,  query works fine, but for fields that are set as stored false, the query does not work after the documents are updated.

SolrInputDocument solrInputDocument = new SolrInputDocument();
                solrInputDocument.addField(“id”,”somevalidId”);

Map<String, Object> fieldModifier = new HashMap<>(1);
                        fieldModifier.put("set", “some value");
                        solrInputDocument.addField(“aNewField", fieldModifier);

Regards

Jie




> On 4 May 2019, at 14:25, Jörn Franke <[hidden email]> wrote:
>
> Hi,
>
> Are you using atomic updates for your documents ? If not then if you change one value it will override the whole document.
>
> Best regards
>
>> Am 04.05.2019 um 12:57 schrieb Jie Luo <[hidden email]>:
>>
>> Dear solr user,
>>
>> I have processes, first process is to build the SolrDocuments and indexes, other processes try to update the other fields of the solrdocuments and their indexes. Then I noticed that  when I search  previous indexed multivalue fields (not stored), it returns wrong results (fewer data). I tested with five documents, it will only return one documents with (field:*) search. Before I run the other processes, the behaviour is fine return 5 documents. However, it seems to me that single valued fields worked fine
>>
>> Best Regards
>>
>> Jie

Reply | Threaded
Open this post in threaded view
|

Re: Update documents cause multivalue fields unexpected behaviour

Shawn Heisey-2
On 5/7/2019 5:45 AM, Jie Luo wrote:
> For the fields that are set as stored true,  query works fine, but for fields that are set as stored false, the query does not work after the documents are updated.
>
> SolrInputDocument solrInputDocument = new SolrInputDocument();
> solrInputDocument.addField(“id”,”somevalidId”);
>
> Map<String, Object> fieldModifier = new HashMap<>(1);
> fieldModifier.put("set", “some value");
> solrInputDocument.addField(“aNewField", fieldModifier);

What you are doing there (using the "set" keyword in a Map object) is
known as an Atomic Update.  That feature has some very strict
requirements, and by setting "stored" on your field, you are violating
those requirements.

In your schema, only copyField destinations can be stored=false.  In
fact, those HAVE to be stored=false.  Everything else will need to have
data retrievable in search results.

Here is a fuller description of what Atomic Updates requires:

https://lucene.apache.org/solr/guide/7_7/updating-parts-of-documents.html#field-storage

Side note, the relevance will make sense once you've read that entire
section of the ref guide: Some field classes (TextField in particular)
do not support docValues.

Thanks,
Shawn