does solr handle updates quickly?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

does solr handle updates quickly?

taitlarson
Hi, I'm new to Solr.  I've just started playing around with it and learning
what it can do.

I'd like to include a vote field on all of my indexed documents.  Users vote
on the content they like.  A vote tally is displayed along with the each
document returned in the results of a search.

Let's say I create a vote field of type SortableIntField.  Users vote
relatively frequently. Assume I send update commands to solr which change
only the vote field approximately 1 time for every 50 searches a user
performs.   What effects will this have on my index? Will search performance
degrade.

Thanks,

Tait
Reply | Threaded
Open this post in threaded view
|

Re: does solr handle updates quickly?

Matthew Runo
This might also be a cool was to increase relevancy. Does Lucene/Solr  
do, or can it do, any sort of increase on relevancy depending on  
which search result a user picks?

Would it be feasible for me to update an index_id with a click count  
each time a user clicks a result, and give this field a boost in the  
results?

+--------------------------------------------------------+
  | Matthew Runo
  | Zappos Development
  | [hidden email]
  | 702-943-7833
+--------------------------------------------------------+


On Apr 22, 2007, at 7:17 PM, Tait Larson wrote:

> Hi, I'm new to Solr.  I've just started playing around with it and  
> learning
> what it can do.
>
> I'd like to include a vote field on all of my indexed documents.  
> Users vote
> on the content they like.  A vote tally is displayed along with the  
> each
> document returned in the results of a search.
>
> Let's say I create a vote field of type SortableIntField.  Users vote
> relatively frequently. Assume I send update commands to solr which  
> change
> only the vote field approximately 1 time for every 50 searches a user
> performs.   What effects will this have on my index? Will search  
> performance
> degrade.
>
> Thanks,
>
> Tait

Reply | Threaded
Open this post in threaded view
|

Re: does solr handle updates quickly?

Brian Whitman
On Apr 23, 2007, at 3:47 PM, Matthew Runo wrote:

> Does Lucene/Solr do, or can it do, any sort of increase on  
> relevancy depending on which search result a user picks?

I don't think it's Lucene or Solr's job to know anything about what  
users do with the results it generates. You may want to look at the  
Nutch (web search) project, but even there i do not believe there is  
any support for relevance feedback.

As a matter of fact, updating only single field in a document is not  
supported in Solr or Lucene last I checked -- you have to post the  
whole document (with change) back in.

In one web application I work on, a rock star Solr dev made a special  
SQL request handler (fork of the Solr RequestHandler) to set votes  
and hit counts per Solr ID. The SQL table is adjoint to the Solr  
index. SQL is at the moment better suited for this type of task.





Reply | Threaded
Open this post in threaded view
|

Re: does solr handle updates quickly?

Chris Hostetter-3
: > Does Lucene/Solr do, or can it do, any sort of increase on
: > relevancy depending on which search result a user picks?

if you have a numeric field in your index, and you want it's value to
influence the score, this is easy to achieve using FunctionQuery

: As a matter of fact, updating only single field in a document is not
: supported in Solr or Lucene last I checked -- you have to post the
: whole document (with change) back in.

Correct, Ryan has been working on supporting an "update" command to allow
individual fields to be modified assume all other fields are stored ...
but it's still in the patch stage at the moment.

to adress the orriginal question: Lucene at it's core is an inverted
index, the nature of which means that to update a document you have to
completley replace and readd the old document, and reopen the index for
searching -- even with something like Ryan's batch to allow updating
documents, it relaly only takes hte burden off your hands to send all the
data, Solr still needs to do aall the same work internally to index the
whole documents ... so there will allways be some non trivial cost
associated with a document update, no matter how minor the update may be.

there is also the cost associated with opening a new Searcher on the index
to expose the changes you've made ... the more frequently this is done,
the more it impacts your ability to cache things, which impacts
performance.

trade offs have to be made ... typically in situations like this i let the
"votes" accumulate and update the all documents that have recieved new
votes in batches periodically (as infrequently as neccessary to improve
caching yet still meet the needs of my users)

-Hoss

Reply | Threaded
Open this post in threaded view
|

Solr index updating pattern

maustin
Could someone give advise on a better way to do this?

I have an index of many merchants and each day I delete merchant products
and re-update my database. After doing this I than re-create the entire
index and move it to production replacing the current index.

I was thinking about updating the index in realtime with only products that
need updated. My concern is that I might be updating 2 million products,
deleting 1 million, and inserting another 1-2 million all in one process. I
guess I could send batches of files to be sucked in and processed but it's
just not as clean as just creating a new index. Do you see an issue with
these massive updates, deletes, and inserts in solr? The problem now is that
I might just be updating 1/2 or 1/4 of the index and I don't need to
re-re-create the entire index again.

What do some of you keep your index updated?  I'm running it off of windows
server so I haven't even looked into the snappuller etc.. stuff.

Thanks,
Mike

Reply | Threaded
Open this post in threaded view
|

Re: Solr index updating pattern

Mike Klaas
On 4/25/07, Mike Austin <[hidden email]> wrote:

> Could someone give advise on a better way to do this?
>
> I have an index of many merchants and each day I delete merchant products
> and re-update my database. After doing this I than re-create the entire
> index and move it to production replacing the current index.
>
> I was thinking about updating the index in realtime with only products that
> need updated. My concern is that I might be updating 2 million products,
> deleting 1 million, and inserting another 1-2 million all in one process. I
> guess I could send batches of files to be sucked in and processed but it's
> just not as clean as just creating a new index. Do you see an issue with
> these massive updates, deletes, and inserts in solr? The problem now is that
> I might just be updating 1/2 or 1/4 of the index and I don't need to
> re-re-create the entire index again.

There isn't necessarily an issue, but there is definitely some
overhead in updating/deleting docs compared with simply writing new
docs.  I've found that re-writing an entire index (ie. updating every
document) is about twice as slow as wiping the index first.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr index updating pattern

Yonik Seeley-2
On 4/26/07, Mike Klaas <[hidden email]> wrote:

> On 4/25/07, Mike Austin <[hidden email]> wrote:
> > Could someone give advise on a better way to do this?
> >
> > I have an index of many merchants and each day I delete merchant products
> > and re-update my database. After doing this I than re-create the entire
> > index and move it to production replacing the current index.
> >
> > I was thinking about updating the index in realtime with only products that
> > need updated. My concern is that I might be updating 2 million products,
> > deleting 1 million, and inserting another 1-2 million all in one process. I
> > guess I could send batches of files to be sucked in and processed but it's
> > just not as clean as just creating a new index. Do you see an issue with
> > these massive updates, deletes, and inserts in solr? The problem now is that
> > I might just be updating 1/2 or 1/4 of the index and I don't need to
> > re-re-create the entire index again.
>
> There isn't necessarily an issue, but there is definitely some
> overhead in updating/deleting docs compared with simply writing new
> docs.  I've found that re-writing an entire index (ie. updating every
> document) is about twice as slow as wiping the index first.

I wish there were a programmatic way to wipe out the index, but due to
platforms like Windows, it's not really possible.  Perhaps Lucene
needs this feature...
Due to the new index format, it would be relatively easy to write a
new segments file that simply dropped all of the existing segments.
Cleanup of the old segments could happen as it does now.

-Yonik