Updating documents

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating documents

Vinicius Carvalho-3
Hi there.

I was checking the faq and found that solr does not support field updates
right. So I assume that in order to update a document, one should first
retrieve it by its Id and then change the required field and update the doc
again. But then I wonder about fields that are indexed and not stored,
since the new document that is sent to the index does not have the values,
would this mean we will loose them?

BTW any chances we see field level updates on 4.0 like elastic search has?

Regards

--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
<[hidden email]> wrote:

> Hi there.
>
> I was checking the faq and found that solr does not support field updates
> right. So I assume that in order to update a document, one should first
> retrieve it by its Id and then change the required field and update the doc
> again. But then I wonder about fields that are indexed and not stored,
> since the new document that is sent to the index does not have the values,
> would this mean we will loose them?
>
> BTW any chances we see field level updates on 4.0 like elastic search has?

I'm actually also looking a this new feature in 4.0-ALPHA:

http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

I was wondering where the new xml tags where documented to support
these "set", "add to multi-value" etc.

--
jonatan

>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Erick Erickson
Vinicius:

No, fetching the document from the index, changing selected values and
re-indexing probably
won't work at all. The problem is that you only get _stored_ values
back from Solr. So unless
you've specified 'stored="true" ' for all your fields, you can't use
the doc fetched from Solr to
update a field.

The partial documents update that Jonatan references also requires
that all the fields be stored.

You're best bet is to go back to your system-of-record for the data
and re-index the whole
document.

Best
Erick

On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
<[hidden email]> wrote:

> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
> <[hidden email]> wrote:
>> Hi there.
>>
>> I was checking the faq and found that solr does not support field updates
>> right. So I assume that in order to update a document, one should first
>> retrieve it by its Id and then change the required field and update the doc
>> again. But then I wonder about fields that are indexed and not stored,
>> since the new document that is sent to the index does not have the values,
>> would this mean we will loose them?
>>
>> BTW any chances we see field level updates on 4.0 like elastic search has?
>
> I'm actually also looking a this new feature in 4.0-ALPHA:
>
> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>
> I was wondering where the new xml tags where documented to support
> these "set", "add to multi-value" etc.
>
> --
> jonatan
>
>>
>> Regards
>>
>> --
>> The intuitive mind is a sacred gift and the
>> rational mind is a faithful servant. We have
>> created a society that honors the servant and
>> has forgotten the gift.
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
Erick,

On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
<[hidden email]> wrote:

> Vinicius:
>
> No, fetching the document from the index, changing selected values and
> re-indexing probably
> won't work at all. The problem is that you only get _stored_ values
> back from Solr. So unless
> you've specified 'stored="true" ' for all your fields, you can't use
> the doc fetched from Solr to
> update a field.
>
> The partial documents update that Jonatan references also requires
> that all the fields be stored.

If my only fields with stored="false" are copyField (e.g. I don't need
their content to rebuild the document), are they gonna be re-copied
with the partial document update?

--
jonatan

>
> You're best bet is to go back to your system-of-record for the data
> and re-index the whole
> document.
>
> Best
> Erick
>
> On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
> <[hidden email]> wrote:
>> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
>> <[hidden email]> wrote:
>>> Hi there.
>>>
>>> I was checking the faq and found that solr does not support field updates
>>> right. So I assume that in order to update a document, one should first
>>> retrieve it by its Id and then change the required field and update the doc
>>> again. But then I wonder about fields that are indexed and not stored,
>>> since the new document that is sent to the index does not have the values,
>>> would this mean we will loose them?
>>>
>>> BTW any chances we see field level updates on 4.0 like elastic search has?
>>
>> I'm actually also looking a this new feature in 4.0-ALPHA:
>>
>> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>>
>> I was wondering where the new xml tags where documented to support
>> these "set", "add to multi-value" etc.
>>
>> --
>> jonatan
>>
>>>
>>> Regards
>>>
>>> --
>>> The intuitive mind is a sacred gift and the
>>> rational mind is a faithful servant. We have
>>> created a society that honors the servant and
>>> has forgotten the gift.
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Yonik Seeley-2-2
On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
<[hidden email]> wrote:
> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>> The partial documents update that Jonatan references also requires
>> that all the fields be stored.
>
> If my only fields with stored="false" are copyField (e.g. I don't need
> their content to rebuild the document), are they gonna be re-copied
> with the partial document update?

Correct - your setup should be fine.  Only original source fields (non
copyField targets) should have stored=true

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
Yonik,

On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
<[hidden email]> wrote:

> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
> <[hidden email]> wrote:
>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>>> The partial documents update that Jonatan references also requires
>>> that all the fields be stored.
>>
>> If my only fields with stored="false" are copyField (e.g. I don't need
>> their content to rebuild the document), are they gonna be re-copied
>> with the partial document update?
>
> Correct - your setup should be fine.  Only original source fields (non
> copyField targets) should have stored=true

Another question I had related to partial update...

$ ./post.sh foo.json
{"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
not found for update.  id=foo","code":409}}

Is there a flag for: if document does not exist, create it for me? The
thing is that I don't know in advance if the document already exist
(of course I could query first.. but I have millions of entry to
process, might exist, might be an update I don't know...)

My naive approach was to have in the same request two documents, one
with only "set" using the unique ID, and then in the second one all
the "add" (concerning multivalue field).

So it would do the following:

1. Document (with id) exist or not don't care, use the following "set"
command to update/create
2. 2nd pass, I know you exist (with above id), please add all those to
the multivalue fields (none of those fields are in the initial
updates)

My rationale is that if the document exists, reset some fields, and
then append the multivalue fields (those multivalue fields express
historical updates)

The reason I created 2 documents is that Solr doesn't seem happy if I
mix set and add in the same document :)

--
jonatan

>
> -Yonik
> http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Yonik Seeley-2-2
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
<[hidden email]> wrote:
> Is there a flag for: if document does not exist, create it for me?

Not currently, but it certainly makes sense.
The implementation should be easy. The most difficult part is figuring
out the best syntax to specify this.

Another idea: we could possibly switch to create-if-not-exist by
default, and use the existing optimistic concurrency mechanism to
specify that the document should exist.

So specify _version_=1 if the document should exist and _version_=0
(the default) if you don't care.

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
<[hidden email]> wrote:

> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
> <[hidden email]> wrote:
>> Is there a flag for: if document does not exist, create it for me?
>
> Not currently, but it certainly makes sense.
> The implementation should be easy. The most difficult part is figuring
> out the best syntax to specify this.
>
> Another idea: we could possibly switch to create-if-not-exist by
> default, and use the existing optimistic concurrency mechanism to
> specify that the document should exist.
>
> So specify _version_=1 if the document should exist and _version_=0
> (the default) if you don't care.

Yes that would be neat!

One more question related to partial document update. So far I'm able
to append to multivalue fields, set new value to regular/multivalue
fields. One thing I didn't find is the "remove" command, what is its
JSON syntax?

Thanks,

--
jonatan

>
> -Yonik
> http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Yonik Seeley-2-2
On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
<[hidden email]> wrote:

> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
> <[hidden email]> wrote:
>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>> <[hidden email]> wrote:
>>> Is there a flag for: if document does not exist, create it for me?
>>
>> Not currently, but it certainly makes sense.
>> The implementation should be easy. The most difficult part is figuring
>> out the best syntax to specify this.
>>
>> Another idea: we could possibly switch to create-if-not-exist by
>> default, and use the existing optimistic concurrency mechanism to
>> specify that the document should exist.
>>
>> So specify _version_=1 if the document should exist and _version_=0
>> (the default) if you don't care.
>
> Yes that would be neat!

I've just committed this change.

> One more question related to partial document update. So far I'm able
> to append to multivalue fields, set new value to regular/multivalue
> fields. One thing I didn't find is the "remove" command, what is its
> JSON syntax?

Set it to the JSON value of null.

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley
<[hidden email]> wrote:

> On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
> <[hidden email]> wrote:
>> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
>> <[hidden email]> wrote:
>>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>>> <[hidden email]> wrote:
>>>> Is there a flag for: if document does not exist, create it for me?
>>>
>>> Not currently, but it certainly makes sense.
>>> The implementation should be easy. The most difficult part is figuring
>>> out the best syntax to specify this.
>>>
>>> Another idea: we could possibly switch to create-if-not-exist by
>>> default, and use the existing optimistic concurrency mechanism to
>>> specify that the document should exist.
>>>
>>> So specify _version_=1 if the document should exist and _version_=0
>>> (the default) if you don't care.
>>
>> Yes that would be neat!
>
> I've just committed this change.

Super thanks! I assume it will end up in the 4.0 release?

>
>> One more question related to partial document update. So far I'm able
>> to append to multivalue fields, set new value to regular/multivalue
>> fields. One thing I didn't find is the "remove" command, what is its
>> JSON syntax?
>
> Set it to the JSON value of null.
>
> -Yonik
> http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Yonik Seeley-2-2
>> I've just committed this change.
>
> Super thanks! I assume it will end up in the 4.0 release?

Yep!

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Jonatan Fournier
In reply to this post by Jonatan Fournier
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
<[hidden email]> wrote:

> Yonik,
>
> On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
> <[hidden email]> wrote:
>> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
>> <[hidden email]> wrote:
>>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>>>> The partial documents update that Jonatan references also requires
>>>> that all the fields be stored.
>>>
>>> If my only fields with stored="false" are copyField (e.g. I don't need
>>> their content to rebuild the document), are they gonna be re-copied
>>> with the partial document update?
>>
>> Correct - your setup should be fine.  Only original source fields (non
>> copyField targets) should have stored=true
>
> Another question I had related to partial update...
>
> $ ./post.sh foo.json
> {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
> not found for update.  id=foo","code":409}}
>
> Is there a flag for: if document does not exist, create it for me? The
> thing is that I don't know in advance if the document already exist
> (of course I could query first.. but I have millions of entry to
> process, might exist, might be an update I don't know...)
>
> My naive approach was to have in the same request two documents, one
> with only "set" using the unique ID, and then in the second one all
> the "add" (concerning multivalue field).
>
> So it would do the following:
>
> 1. Document (with id) exist or not don't care, use the following "set"
> command to update/create
> 2. 2nd pass, I know you exist (with above id), please add all those to
> the multivalue fields (none of those fields are in the initial
> updates)
>
> My rationale is that if the document exists, reset some fields, and
> then append the multivalue fields (those multivalue fields express
> historical updates)

Probably silly mistake on my side, but I don't seem to get the
"append/add" JSON syntax right for multiValue fields...

On my document initial creation it works great with

...
"mv_f":"cat1",
"mv_f":"cat2",
...

But later on when I want to "append" cat3 to the field by doing this:

"mv_f":{"add":"cat3"},
...

I end up with something like this in the index:

"mv_f":["{add=cat3}"],

Obviously something is wrong with my syntax ;)

--
jonatan

>
> The reason I created 2 documents is that Solr doesn't seem happy if I
> mix set and add in the same document :)
>
> --
> jonatan
>
>>
>> -Yonik
>> http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Updating documents

Yonik Seeley-2-2
On Fri, Jul 13, 2012 at 3:50 PM, Jonatan Fournier
<[hidden email]> wrote:

> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
> <[hidden email]> wrote:
> But later on when I want to "append" cat3 to the field by doing this:
>
> "mv_f":{"add":"cat3"},
> ...
>
> I end up with something like this in the index:
>
> "mv_f":["{add=cat3}"],
>
> Obviously something is wrong with my syntax ;)

Are you using a custom update processor chain?  The
DistributedUpdateProcessor currently contains the logic for optimistic
concurrency and updates.
If you're not already, try some test commands with the stock server.

If you are already using the stock server, then perhaps you're not
sending what you think you are to Solr?

-Yonik
http://lucidimagination.com