negation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

negation

alexander lind
Hi all

Say that I have a solr index with 5000 documents, each representing a  
campaign that users of my site can join. The user can search and find  
these campaigns in various ways, which is not a problem, but once a  
user has found a campaign and joined it, I don't want that campaign to  
ever show up again for that particular user.

After a while, a user can have built up a list of say 200 campaigns  
that he has joined, and hence should never see in any search results  
again.

I know this functionality could be achieved by simply building a  
longer and longer negation query negating all the campaigns that a  
user already has joined. I would assume that this would become slow  
and ineffective eventually.

My question is: is there a better way to do this?

Thanks
Alec
Reply | Threaded
Open this post in threaded view
|

Re: negation

Rachel McConnell-3
We do something similar in a different context.  I don't know if our
way is necessarily better, but it would work like this:

1. add a field to campaign called something like enteredUsers
2. once a user adds a campaign, update the campaign, adding a value
unique to that user to enteredUsers
3. the negation can now be done by excluding the user's unique id from
the enteredUsers field, instead of excluding all the user's campaigns

The downside is it will increase the number of your commits, which may
or may not be OK.

Rachel

On 2/13/08, alexander lind <[hidden email]> wrote:

> Hi all
>
> Say that I have a solr index with 5000 documents, each representing a
> campaign that users of my site can join. The user can search and find
> these campaigns in various ways, which is not a problem, but once a
> user has found a campaign and joined it, I don't want that campaign to
> ever show up again for that particular user.
>
> After a while, a user can have built up a list of say 200 campaigns
> that he has joined, and hence should never see in any search results
> again.
>
> I know this functionality could be achieved by simply building a
> longer and longer negation query negating all the campaigns that a
> user already has joined. I would assume that this would become slow
> and ineffective eventually.
>
> My question is: is there a better way to do this?
>
> Thanks
> Alec
>
Reply | Threaded
Open this post in threaded view
|

Re: negation

alexander lind
Have you done any stress tests on this setup? Is it working well for  
you?
It sounds like something that could work quite well for me too, but I  
would be a little worried that a commit could time out, and a unique  
value could be lost for that user.

Thank you
Alec

On Feb 13, 2008, at 1:10 PM, Rachel McConnell wrote:

> We do something similar in a different context.  I don't know if our
> way is necessarily better, but it would work like this:
>
> 1. add a field to campaign called something like enteredUsers
> 2. once a user adds a campaign, update the campaign, adding a value
> unique to that user to enteredUsers
> 3. the negation can now be done by excluding the user's unique id from
> the enteredUsers field, instead of excluding all the user's campaigns
>
> The downside is it will increase the number of your commits, which may
> or may not be OK.
>
> Rachel
>
> On 2/13/08, alexander lind <[hidden email]> wrote:
>> Hi all
>>
>> Say that I have a solr index with 5000 documents, each representing a
>> campaign that users of my site can join. The user can search and find
>> these campaigns in various ways, which is not a problem, but once a
>> user has found a campaign and joined it, I don't want that campaign  
>> to
>> ever show up again for that particular user.
>>
>> After a while, a user can have built up a list of say 200 campaigns
>> that he has joined, and hence should never see in any search results
>> again.
>>
>> I know this functionality could be achieved by simply building a
>> longer and longer negation query negating all the campaigns that a
>> user already has joined. I would assume that this would become slow
>> and ineffective eventually.
>>
>> My question is: is there a better way to do this?
>>
>> Thanks
>> Alec
>>

Reply | Threaded
Open this post in threaded view
|

Re: negation

Rachel McConnell-3
We've been using this in production for at least six months.  I have
never stress-tested this particular feature, but we usually do over
100k unique hits a day.  Of those, most hit Solr for one thing or
another, but a much smaller percentage use this specific bit.  It
isn't the fastest query but as we use it there are some additional
complexities so YMMV.

We aren't at risk for data loss from Solr, as we maintain all data in
our database backend; Solr is essentially a slave to that.  So we have
a db field, enteredUsers, which has the usual JDBC failure checking
and any error is handled gracefully.  And the Solr index is then
updated from the db periodically (we're optimized for faster search
results, over up-to-date-ness).

R

On 2/13/08, alexander lind <[hidden email]> wrote:

> Have you done any stress tests on this setup? Is it working well for
> you?
> It sounds like something that could work quite well for me too, but I
> would be a little worried that a commit could time out, and a unique
> value could be lost for that user.
>
> Thank you
> Alec
>
> On Feb 13, 2008, at 1:10 PM, Rachel McConnell wrote:
>
> > We do something similar in a different context.  I don't know if our
> > way is necessarily better, but it would work like this:
> >
> > 1. add a field to campaign called something like enteredUsers
> > 2. once a user adds a campaign, update the campaign, adding a value
> > unique to that user to enteredUsers
> > 3. the negation can now be done by excluding the user's unique id from
> > the enteredUsers field, instead of excluding all the user's campaigns
> >
> > The downside is it will increase the number of your commits, which may
> > or may not be OK.
> >
> > Rachel
> >
> > On 2/13/08, alexander lind <[hidden email]> wrote:
> >> Hi all
> >>
> >> Say that I have a solr index with 5000 documents, each representing a
> >> campaign that users of my site can join. The user can search and find
> >> these campaigns in various ways, which is not a problem, but once a
> >> user has found a campaign and joined it, I don't want that campaign
> >> to
> >> ever show up again for that particular user.
> >>
> >> After a while, a user can have built up a list of say 200 campaigns
> >> that he has joined, and hence should never see in any search results
> >> again.
> >>
> >> I know this functionality could be achieved by simply building a
> >> longer and longer negation query negating all the campaigns that a
> >> user already has joined. I would assume that this would become slow
> >> and ineffective eventually.
> >>
> >> My question is: is there a better way to do this?
> >>
> >> Thanks
> >> Alec
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: negation

alexander lind
I think I will try a hybrid version. One that uses my simple negation  
for newly joined campaigns, and uses your method to filter out  
campaigns joined longer ago. A cron:ed script will run every night and  
add all new user_id:s to the appropriate campaigns. That way I don't  
have to re-index on the fly at daytime when the server is going to be  
the busiest, and there should be less commits to the solr instance  
too, one per campaign max, instead of one per every join.

Thanks for your input on this Rachel!

Alec

On Feb 13, 2008, at 2:01 PM, Rachel McConnell wrote:

> We've been using this in production for at least six months.  I have
> never stress-tested this particular feature, but we usually do over
> 100k unique hits a day.  Of those, most hit Solr for one thing or
> another, but a much smaller percentage use this specific bit.  It
> isn't the fastest query but as we use it there are some additional
> complexities so YMMV.
>
> We aren't at risk for data loss from Solr, as we maintain all data in
> our database backend; Solr is essentially a slave to that.  So we have
> a db field, enteredUsers, which has the usual JDBC failure checking
> and any error is handled gracefully.  And the Solr index is then
> updated from the db periodically (we're optimized for faster search
> results, over up-to-date-ness).
>
> R
>
> On 2/13/08, alexander lind <[hidden email]> wrote:
>> Have you done any stress tests on this setup? Is it working well for
>> you?
>> It sounds like something that could work quite well for me too, but I
>> would be a little worried that a commit could time out, and a unique
>> value could be lost for that user.
>>
>> Thank you
>> Alec
>>
>> On Feb 13, 2008, at 1:10 PM, Rachel McConnell wrote:
>>
>>> We do something similar in a different context.  I don't know if our
>>> way is necessarily better, but it would work like this:
>>>
>>> 1. add a field to campaign called something like enteredUsers
>>> 2. once a user adds a campaign, update the campaign, adding a value
>>> unique to that user to enteredUsers
>>> 3. the negation can now be done by excluding the user's unique id  
>>> from
>>> the enteredUsers field, instead of excluding all the user's  
>>> campaigns
>>>
>>> The downside is it will increase the number of your commits, which  
>>> may
>>> or may not be OK.
>>>
>>> Rachel
>>>
>>> On 2/13/08, alexander lind <[hidden email]> wrote:
>>>> Hi all
>>>>
>>>> Say that I have a solr index with 5000 documents, each  
>>>> representing a
>>>> campaign that users of my site can join. The user can search and  
>>>> find
>>>> these campaigns in various ways, which is not a problem, but once a
>>>> user has found a campaign and joined it, I don't want that campaign
>>>> to
>>>> ever show up again for that particular user.
>>>>
>>>> After a while, a user can have built up a list of say 200 campaigns
>>>> that he has joined, and hence should never see in any search  
>>>> results
>>>> again.
>>>>
>>>> I know this functionality could be achieved by simply building a
>>>> longer and longer negation query negating all the campaigns that a
>>>> user already has joined. I would assume that this would become slow
>>>> and ineffective eventually.
>>>>
>>>> My question is: is there a better way to do this?
>>>>
>>>> Thanks
>>>> Alec
>>>>
>>
>>