Atomic Updates in SOLR

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomic Updates in SOLR

Anupam Bhattacharya
I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Shalin Shekhar Mangar
Perhaps you are running the update request more than once accidentally?

Can you try using optimistic update with _version_ while sending the
update? This way, if some part of your code is making a duplicate request
then Solr would throw an error.

See
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <[hidden email]>wrote:

> I am working on a offline tagging capability to tag records with a
> thesaurus dictionary of key concepts. I am able to use the update="add"
> option using xml and json update calls for a field to update specific
> document field information. Although if I run the same atomic update query
> twice then the multivalued string fields start showing duplicate value in
> the multivalued field.
> e.g. for a field name as tag at the initial it was having copper, iron,
> steel
> After running the atomic update query with <field name="tag"
> update="add">steel</field> I will get the tag field values as following:
> copper, iron, steel, steel. (Thus steel get added twice).
> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
> duplicate not multivalued field duplicates. Is there any updateProcessor to
> stop the incoming duplicate value from indexing ?
>
> Thanks in advance for any help.
>
> Regards
> Anupam
>



--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Anshum Gupta
I am not sure if optimistic concurrency would help in deduplicating but
yes, as Shalin points out, you'll be able to spot issues with your client
code.




On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
[hidden email]> wrote:

> Perhaps you are running the update request more than once accidentally?
>
> Can you try using optimistic update with _version_ while sending the
> update? This way, if some part of your code is making a duplicate request
> then Solr would throw an error.
>
> See
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
>
> On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <[hidden email]
> >wrote:
>
> > I am working on a offline tagging capability to tag records with a
> > thesaurus dictionary of key concepts. I am able to use the update="add"
> > option using xml and json update calls for a field to update specific
> > document field information. Although if I run the same atomic update
> query
> > twice then the multivalued string fields start showing duplicate value in
> > the multivalued field.
> > e.g. for a field name as tag at the initial it was having copper, iron,
> > steel
> > After running the atomic update query with <field name="tag"
> > update="add">steel</field> I will get the tag field values as following:
> > copper, iron, steel, steel. (Thus steel get added twice).
> > I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
> token
> > duplicate not multivalued field duplicates. Is there any updateProcessor
> to
> > stop the incoming duplicate value from indexing ?
> >
> > Thanks in advance for any help.
> >
> > Regards
> > Anupam
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--

Anshum Gupta
http://www.anshumgupta.net
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Shalin Shekhar Mangar
In reply to this post by Shalin Shekhar Mangar
Ah I misread your email. You are actually sending the update twice and
asking about how to dedup the multi-valued field values.

No I don't think we have an update processor which can do that.


On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
[hidden email]> wrote:

> Perhaps you are running the update request more than once accidentally?
>
> Can you try using optimistic update with _version_ while sending the
> update? This way, if some part of your code is making a duplicate request
> then Solr would throw an error.
>
> See
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
>
> On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <[hidden email]>wrote:
>
>> I am working on a offline tagging capability to tag records with a
>> thesaurus dictionary of key concepts. I am able to use the update="add"
>> option using xml and json update calls for a field to update specific
>> document field information. Although if I run the same atomic update query
>> twice then the multivalued string fields start showing duplicate value in
>> the multivalued field.
>> e.g. for a field name as tag at the initial it was having copper, iron,
>> steel
>> After running the atomic update query with <field name="tag"
>> update="add">steel</field> I will get the tag field values as following:
>> copper, iron, steel, steel. (Thus steel get added twice).
>> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
>> token
>> duplicate not multivalued field duplicates. Is there any updateProcessor
>> to
>> stop the incoming duplicate value from indexing ?
>>
>> Thanks in advance for any help.
>>
>> Regards
>> Anupam
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Anshum Gupta
Think it'll be a good thing to have.
I just created a JIRA for that.
https://issues.apache.org/jira/browse/SOLR-5403

Will try and get to it soon.


On Wed, Oct 30, 2013 at 4:28 PM, Shalin Shekhar Mangar <
[hidden email]> wrote:

> Ah I misread your email. You are actually sending the update twice and
> asking about how to dedup the multi-valued field values.
>
> No I don't think we have an update processor which can do that.
>
>
> On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
> [hidden email]> wrote:
>
> > Perhaps you are running the update request more than once accidentally?
> >
> > Can you try using optimistic update with _version_ while sending the
> > update? This way, if some part of your code is making a duplicate request
> > then Solr would throw an error.
> >
> > See
> >
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> >
> >
> > On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <
> [hidden email]>wrote:
> >
> >> I am working on a offline tagging capability to tag records with a
> >> thesaurus dictionary of key concepts. I am able to use the update="add"
> >> option using xml and json update calls for a field to update specific
> >> document field information. Although if I run the same atomic update
> query
> >> twice then the multivalued string fields start showing duplicate value
> in
> >> the multivalued field.
> >> e.g. for a field name as tag at the initial it was having copper, iron,
> >> steel
> >> After running the atomic update query with <field name="tag"
> >> update="add">steel</field> I will get the tag field values as following:
> >> copper, iron, steel, steel. (Thus steel get added twice).
> >> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
> >> token
> >> duplicate not multivalued field duplicates. Is there any updateProcessor
> >> to
> >> stop the incoming duplicate value from indexing ?
> >>
> >> Thanks in advance for any help.
> >>
> >> Regards
> >> Anupam
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--

Anshum Gupta
http://www.anshumgupta.net
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Jack Krupansky-2
In reply to this post by Anupam Bhattacharya
Unfortunately, atomic "add" is add to a "list" (append) rather than add to a
"set" (only unique values). But, you can use the unique fields update
processor (solr.UniqFieldsUpdateProcessorFactory) to de-dupe specified
multivalued fields.

See:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

My e-book has more examples as well.

-- Jack Krupansky

-----Original Message-----
From: Anupam Bhattacharya
Sent: Wednesday, October 30, 2013 6:05 AM
To: [hidden email]
Subject: Atomic Updates in SOLR

I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam

Reply | Threaded
Open this post in threaded view
|

Re: Atomic Updates in SOLR

Jack Krupansky-2
Oops... need to note that the parameters have changed since Solr 4.4 - I
gave the link for 4.5.1, but for 4.4 and earlier, use:

http://lucene.eu.apache.org/solr/4_4_0/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

(My book is for 4.4, but hasn't been updated for 4.5 yet, but the gist of
the examples is the same.)

-- Jack Krupansky

-----Original Message-----
From: Jack Krupansky
Sent: Wednesday, October 30, 2013 9:03 AM
To: [hidden email]
Subject: Re: Atomic Updates in SOLR

Unfortunately, atomic "add" is add to a "list" (append) rather than add to a
"set" (only unique values). But, you can use the unique fields update
processor (solr.UniqFieldsUpdateProcessorFactory) to de-dupe specified
multivalued fields.

See:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

My e-book has more examples as well.

-- Jack Krupansky

-----Original Message-----
From: Anupam Bhattacharya
Sent: Wednesday, October 30, 2013 6:05 AM
To: [hidden email]
Subject: Atomic Updates in SOLR

I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam