documents with known relevancy

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

documents with known relevancy

fiedzia
I want to  know if what i am trying to achieve is doable using solr.

I have some objects that have tags assigned. Tag is as string with weight attached,
so whole document that i want to index can look like that:
{
  id: 123,
  tags: {
          tag1: 0.01,
          tag2: 0.3,
          ...
          tagN: some_weight
          }
}
Now i want to store list of tags and sort returned results by tag weight.
The list of tags can be large (up to thousands per document, though mostly much less).
So when i am querying solr for documents containing tag1, it should return all documents containing it,
sorted by weight of this tag. Is there any way to do that?
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

Peter
Hi,

Why do you need the weight for the tags?

you could index it this way:

{
 id:     123
 tag:    'tag1'
 weight:  0.01
 uniqueKey: combine(id, tag)
}

{
 id:     123
 tag:    'tag2'
 weight:  0.3
 uniqueKey: combine(id, tag)
}

and specify the query-time boost with the help of the weight.
Retrieving the document content in a second request to another solrindex or using a db.

there could be a different solution using dynamic fields and index-time boosts but I am not sure at the moment.

Regards,
Peter.

> I want to  know if what i am trying to achieve is doable using solr.
>
> I have some objects that have tags assigned. Tag is as string with weight
> attached,
> so whole document that i want to index can look like that:
> {
>   id: 123,
>   tags: {
>           tag1: 0.01,
>           tag2: 0.3,
>           ...
>           tagN: some_weight
>           }
> }
> Now i want to store list of tags and sort returned results by tag weight.
> The list of tags can be large (up to thousands per document, though mostly
> much less).
> So when i am querying solr for documents containing tag1, it should return
> all documents containing it,
> sorted by weight of this tag. Is there any way to do that?
>  

Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

fiedzia
Peter Karich wrote
Hi,

Why do you need the weight for the tags?
The only reason to include weights is to sort results by weights.
So if there are multiple documents containing given tag,
i want them to be sorted by weight. Also i would like to be able
to seach by multiple tags at once (so if there would be field "tags" with all tags,
then documents with highest sum of their weights shoud be first. Sum is just example here,
if solr can offer something similar or more advanced, its fine).


Peter Karich wrote
you could index it this way:

{
 id:     123
 tag:    'tag1'
 weight:  0.01
 uniqueKey: combine(id, tag)
}

{
 id:     123
 tag:    'tag2'
 weight:  0.3
 uniqueKey: combine(id, tag)
}

and specify the query-time boost with the help of the weight.
Retrieving the document content in a second request to another solrindex or using a db.
Well, that would work for querying  for single tag. Do you know solution
solving problem of querying for multiple tags?

Perhaps i can explain the problem better by presenting obvious solution:
create multivalue field "tags" with all tags. Ths will allow to easily ask solr for documents matching query
(which may look like that:  tags:tag1 AND tags:tag2). Then get list of all results, retrieve tag weights from database and sort them by weight. This is obviously inneficient, as it requires getting all documents from solr (possibly large list), then again get them from db, then calculate weights then sort them. So i am trying to involve solr in this processing.

Other solution i can think could work (though haven't examined it fully yet) woud be to create single text field for tags with tags occurences matching tag weight (so if tag2 weigtht is twice as big as tag1,
then the text contains tag1 once and tag2 twice ("tag1 tag2 tag2"), then calculate document score
basing on amount of occurences of given tag in text). From what i know about solr this could be done,
but maybe there is a better solution.

Peter Karich wrote
there could be a different solution using dynamic fields and index-time boosts but I am not sure at the moment.
Can write more about it? Any idea is welcome.

Thanks for your help anyway.
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

fiedzia
I came up with another idea, which seem to do what i want. Any comments about better solutions
or improving efficiency are welcome:

for each document create multivalue text field "tags" with all tags,
and multiple dynamic fields for each tag containging value, so we have:
{
  id: 123
  tags: tag1, tag2, ..., tagN
  tag1_float: 0.1,
  tag2_float: 0.2,
  ...
  tagN_float: 0.3,
}

then query for tag1 and tag2 could like that:
tags:tag1 AND tags: tag2
and sort results by sum of tag1_float and tag2_float.
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

gearond
In reply to this post by fiedzia
Seems to me that you are doing externally to Solr what you could be doing internally. If you had ONE field as <tags> and weighted those in your SOLR query, that is how I am guessing it is usually done.
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, fiedzia <[hidden email]> wrote:

> From: fiedzia <[hidden email]>
> Subject: documents with known relevancy
> To: [hidden email]
> Date: Friday, July 16, 2010, 5:59 AM
>
> I want to  know if what i am trying to achieve is
> doable using solr.
>
> I have some objects that have tags assigned. Tag is as
> string with weight
> attached,
> so whole document that i want to index can look like that:
> {
>   id: 123,
>   tags: {
>           tag1: 0.01,
>           tag2: 0.3,
>           ...
>           tagN: some_weight
>           }
> }
> Now i want to store list of tags and sort returned results
> by tag weight.
> The list of tags can be large (up to thousands per
> document, though mostly
> much less).
> So when i am querying solr for documents containing tag1,
> it should return
> all documents containing it,
> sorted by weight of this tag. Is there any way to do that?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972462.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

gearond
In reply to this post by fiedzia
So does this mean that each document has a different weight for the same tag?
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, fiedzia <[hidden email]> wrote:

> From: fiedzia <[hidden email]>
> Subject: Re: documents with known relevancy
> To: [hidden email]
> Date: Friday, July 16, 2010, 8:06 AM
>
>
> Peter Karich wrote:
> >
> > Hi,
> >
> > Why do you need the weight for the tags?
> >
>
> The only reason to include weights is to sort results by
> weights.
> So if there are multiple documents containing given tag,
> i want them to be sorted by weight. Also i would like to be
> able
> to seach by multiple tags at once (so if there would be
> field "tags" with
> all tags,
> then documents with highest sum of their weights shoud be
> first. Sum is just
> example here,
> if solr can offer something similar or more advanced, its
> fine).
>
>
>
> Peter Karich wrote:
> >
> > you could index it this way:
> >
> > {
> >  id:     123
> >  tag:    'tag1'
> >  weight:  0.01
> >  uniqueKey: combine(id, tag)
> > }
> >
> > {
> >  id:     123
> >  tag:    'tag2'
> >  weight:  0.3
> >  uniqueKey: combine(id, tag)
> > }
> >
> > and specify the query-time boost with the help of the
> weight.
> > Retrieving the document content in a second request to
> another solrindex
> > or using a db.
> >
>
> Well, that would work for querying  for single tag. Do
> you know solution
> solving problem of querying for multiple tags?
>
> Perhaps i can explain the problem better by presenting
> obvious solution:
> create multivalue field "tags" with all tags. Ths will
> allow to easily ask
> solr for documents matching query
> (which may look like that:  tags:tag1 AND tags:tag2).
> Then get list of all
> results, retrieve tag weights from database and sort them
> by weight. This is
> obviously inneficient, as it requires getting all documents
> from solr
> (possibly large list), then again get them from db, then
> calculate weights
> then sort them. So i am trying to involve solr in this
> processing.
>
> Other solution i can think could work (though haven't
> examined it fully yet)
> woud be to create single text field for tags with tags
> occurences matching
> tag weight (so if tag2 weigtht is twice as big as tag1,
> then the text contains tag1 once and tag2 twice ("tag1 tag2
> tag2"), then
> calculate document score
> basing on amount of occurences of given tag in text). From
> what i know about
> solr this could be done,
> but maybe there is a better solution.
>
>
> Peter Karich wrote:
> >
> > there could be a different solution using dynamic
> fields and index-time
> > boosts but I am not sure at the
> moment.   
> >
>
> Can write more about it? Any idea is welcome.
>
> Thanks for your help anyway.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972748.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

fiedzia
Dennis Gearon wrote
So does this mean that each document has a different weight for the same tag?
Exactly. The weight is a weight of a given tag for specific document, not weight of the field as in weighted search. So one document may have tag1 with weight of 0.1, and another may have the same tag1 with weight=0.8.
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

fiedzia
In reply to this post by gearond
Dennis Gearon wrote
Seems to me that you are doing externally to Solr what you could be doing internally. If you had ONE field as <tags> and weighted those in your SOLR query, that is how I am guessing it is usually done.
I guess i used confusing term for weight. The weight (value assigned for given tag) is document specific and may be different for each document, it is not weight of a field as in weighted search.
Reply | Threaded
Open this post in threaded view
|

RE: documents with known relevancy

Jonathan Rochkind-2
In reply to this post by fiedzia
> Exactly. The weight is a weight of a given tag for specific document, not
> weight of the field as in weighted search. So one document may have tag1
> with weight of 0.1, and another may have the same tag1 with weight=0.8.

I've never used it, but I think this is the use case that the Solr feature to use Lucene 'payloads' is meant for?  
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

Peter
In reply to this post by fiedzia
I didn't looked at payloads as mentioned by Jonathan, but another
solution could be (similar to Dennis'):

create a field 'tags' and then add the tag1 several times to it -
depending on the weight.
E.g. add it 10 times if the weight is 1.0
But add it only 2 times if the weight is 0.2 etc.

Of course this limits the weight to 11 weights (0, 0.1, 0.2, ... and 1)
but should work :-)

Regards,
Peter.

> I came up with another idea, which seem to do what i want. Any comments about
> better solutions
> or improving efficiency are welcome:
>
> for each document create multivalue text field "tags" with all tags,
> and multiple dynamic fields for each tag containging value, so we have:
> {
>   id: 123
>   tags: tag1, tag2, ..., tagN
>   tag1_float: 0.1,
>   tag2_float: 0.2,
>   ...
>   tagN_float: 0.3,
> }
>
> then query for tag1 and tag2 could like that:
> tags:tag1 AND tags: tag2
> and sort results by sum of tag1_float and tag2_float.
>
>  


--
http://karussell.wordpress.com/

Reply | Threaded
Open this post in threaded view
|

Re: documents with known relevancy

gearond
Looks to me like a sort of way to get to 'categories', if one were interested in doing that, shudder.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, Peter Karich <[hidden email]> wrote:

> From: Peter Karich <[hidden email]>
> Subject: Re: documents with known relevancy
> To: [hidden email]
> Date: Friday, July 16, 2010, 12:25 PM
> I didn't looked at payloads as
> mentioned by Jonathan, but another
> solution could be (similar to Dennis'):
>
> create a field 'tags' and then add the tag1 several times
> to it -
> depending on the weight.
> E.g. add it 10 times if the weight is 1.0
> But add it only 2 times if the weight is 0.2 etc.
>
> Of course this limits the weight to 11 weights (0, 0.1,
> 0.2, ... and 1)
> but should work :-)
>
> Regards,
> Peter.
>
> > I came up with another idea, which seem to do what i
> want. Any comments about
> > better solutions
> > or improving efficiency are welcome:
> >
> > for each document create multivalue text field "tags"
> with all tags,
> > and multiple dynamic fields for each tag containging
> value, so we have:
> > {
> >   id: 123
> >   tags: tag1, tag2, ..., tagN
> >   tag1_float: 0.1,
> >   tag2_float: 0.2,
> >   ...
> >   tagN_float: 0.3,
> > }
> >
> > then query for tag1 and tag2 could like that:
> > tags:tag1 AND tags: tag2
> > and sort results by sum of tag1_float and tag2_float.
> >
> >   
>
>
> --
> http://karussell.wordpress.com/
>
>
Reply | Threaded
Open this post in threaded view
|

RE: documents with known relevancy

fiedzia
In reply to this post by Jonathan Rochkind-2
Jonathan Rochkind wrote
I've never used it, but I think this is the use case that the Solr feature to use Lucene 'payloads' is meant for?  
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
This is it, thanks for this link.