Update a field without reindexing the entire document?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Update a field without reindexing the entire document?

Galen Pahlke
Hi, I'm wondering if theres a way to change a single field of a document without re-indexing every field.  I'd like to do something like this:

<add><doc><field name="id">1</field><field name="field1">val1</field></doc></add>

Then later:

<add><doc><field name="id">1</field><field name="field2">val2</field></doc></add>

After the second statement, the document is overwritten, so the value of field1 is lost.  Is there a way I can do something like this so that documents are only updated, as opposed to overwritten? I've looked through the docs but couldn't find anything.

Thanks,
- Galen Pahlke
Reply | Threaded
Open this post in threaded view
|

Re: Update a field without reindexing the entire document?

Otis Gospodnetic-2
Hi Galen,

See SOLR-139 (this is from memory) issue in JIRA.  Doable, but not in Solr nightlies yet, I believe (also from memory), and requires all your fields to be stored.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Galen Pahlke <[hidden email]>
To: [hidden email]
Sent: Tuesday, March 25, 2008 4:21:45 PM
Subject: Update a field without reindexing the entire document?


Hi, I'm wondering if theres a way to change a single field of a document
without re-indexing every field.  I'd like to do something like this:

<add><doc><field name="id">1</field><field
name="field1">val1</field></doc></add>

Then later:

<add><doc><field name="id">1</field><field
name="field2">val2</field></doc></add>

After the second statement, the document is overwritten, so the value of
field1 is lost.  Is there a way I can do something like this so that
documents are only updated, as opposed to overwritten? I've looked through
the docs but couldn't find anything.

Thanks,
- Galen Pahlke
--
View this message in context: http://www.nabble.com/Update-a-field-without-reindexing-the-entire-document--tp16287718p16287718.html
Sent from the Solr - User mailing list archive at Nabble.com.




Ard
Reply | Threaded
Open this post in threaded view
|

RE: Update a field without reindexing the entire document?

Ard
Hello Otis,

I have been looking for something similar for Jackrabbit's lucene index,
but I still have some uncertainty about wether I understand correctly
what the patches in SOLR-139 supply:

Do they just retrieve formerly stored fields of a lucene Document,
change some field, and then analyze and tokenize the fetched fields
again? I am merely interested in avoiding the analyzing and tokenisation
of the entire Document when for example a single Field changes (think
about 100 Mb pdf's in Jackrabbit which I do not want to extract the
content from again when just a single small prop changes). I got some
pointers before from Karl Wettin (see [1])when using term vectors that I
can re-assemble the tokenstream without having the expensive analyzing
again.

Anyway, is this what is understood with modifying an existing lucene
document, or is it done with retrieving stored fields and analyze them
again? Thanks for any clarifications.

[1]
http://www.nabble.com/Reusing-indexed-and-analyzed-documents-tt15000023.
html#a15000023

[hidden email] - [hidden email] - www.onehippo.com
-------------------------------------------------------------
Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
San Francisco - Hippo USA Inc. 101 H Street, suite Q Petaluma CA
94952-3329 +1 (707) 773-4646
-------------------------------------------------------------


>
> Hi Galen,
>
> See SOLR-139 (this is from memory) issue in JIRA.  Doable,
> but not in Solr nightlies yet, I believe (also from memory),
> and requires all your fields to be stored.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: Galen Pahlke <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, March 25, 2008 4:21:45 PM
> Subject: Update a field without reindexing the entire document?
>
>
> Hi, I'm wondering if theres a way to change a single field of
> a document without re-indexing every field.  I'd like to do
> something like this:
>
> <add><doc><field name="id">1</field><field
> name="field1">val1</field></doc></add>
>
> Then later:
>
> <add><doc><field name="id">1</field><field
> name="field2">val2</field></doc></add>
>
> After the second statement, the document is overwritten, so
> the value of
> field1 is lost.  Is there a way I can do something like this
> so that documents are only updated, as opposed to
> overwritten? I've looked through the docs but couldn't find anything.
>
> Thanks,
> - Galen Pahlke
> --
> View this message in context:
> http://www.nabble.com/Update-a-field-without-reindexing-the-en
> tire-document--tp16287718p16287718.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Update a field without reindexing the entire document?

Vinci
In reply to this post by Otis Gospodnetic-2
Hi Otis,

One question: If the target field is a multi-value field, what will be the consequence of the update for SOLR-139: overriding or appending?

Thank you,
Vinci

Otis Gospodnetic wrote
Hi Galen,

See SOLR-139 (this is from memory) issue in JIRA.  Doable, but not in Solr nightlies yet, I believe (also from memory), and requires all your fields to be stored.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Galen Pahlke <pahlke@gmail.com>
To: solr-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 4:21:45 PM
Subject: Update a field without reindexing the entire document?


Hi, I'm wondering if theres a way to change a single field of a document
without re-indexing every field.  I'd like to do something like this:

<add><doc><field name="id">1</field><field
name="field1">val1</field></doc></add>

Then later:

<add><doc><field name="id">1</field><field
name="field2">val2</field></doc></add>

After the second statement, the document is overwritten, so the value of
field1 is lost.  Is there a way I can do something like this so that
documents are only updated, as opposed to overwritten? I've looked through
the docs but couldn't find anything.

Thanks,
- Galen Pahlke
--
View this message in context: http://www.nabble.com/Update-a-field-without-reindexing-the-entire-document--tp16287718p16287718.html
Sent from the Solr - User mailing list archive at Nabble.com.



Reply | Threaded
Open this post in threaded view
|

Re: Update a field without reindexing the entire document?

Erik Hatcher

On Mar 26, 2008, at 4:28 AM, Vinci wrote:
> One question: If the target field is a multi-value field, what will  
> be the
> consequence of the update for SOLR-139: overriding or appending?

You can specify when you update a field how that works.

SOLR-139, though, seems a long way from being included in Solr -  
needs lots of work.  (but it is being used on Collex, a project I  
worked on that allows documents in Solr to be tagged/annotated)

        Erik