Boost document base on field length

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Boost document base on field length

Tomasz Kępski
Hi,

I would like to boost documents with longer descriptions to move down
documents with 0 length description,
I'm wondering if there is possibility to boost document basing on the
field length while searching or the only way is to store field length as
an int in a separate field while indexing?

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Boost document base on field length

Grant Ingersoll-2

On Nov 23, 2009, at 8:01 AM, Tomasz Kępski wrote:

> Hi,
>
> I would like to boost documents with longer descriptions to move down documents with 0 length description,
> I'm wondering if there is possibility to boost document basing on the field length while searching or the only way is to store field length as an int in a separate field while indexing?

Override the default Similarity (see the end of the schema.xml file) with your own Similarity implementation and then in that class override the lengthNorm() method.
Reply | Threaded
Open this post in threaded view
|

Re: Boost document base on field length

hossman

: > I would like to boost documents with longer descriptions to move down documents with 0 length description,
: > I'm wondering if there is possibility to boost document basing on the field length while searching or the only way is to store field length as an int in a separate field while indexing?
:
: Override the default Similarity (see the end of the schema.xml file)
: with your own Similarity implementation and then in that class override
: the lengthNorm() method.


I think i'm reading he question differently then Grant -- his suggestion
applies when you are searching in the description field, and don't want
documents with shorter descriptions to score higher when the same terms
match the same number of times (the default behavior of lengthNorm)

my udnerstanding is that you want documents that don't have a description
to score lower then documents that do -- and you might be querying against
completely differnet fields (description might not even be indexed)

in that case there is no easy way to to achieve this with just the
description field ... the easy thing to do is to index a boolean
"has_description" field and then incorporate that into your query (or as
the input to a function query)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Boost document base on field length

Tomasz Kępski
Hi,

> I think i'm reading he question differently then Grant -- his suggestion
> applies when you are searching in the description field, and don't want
> documents with shorter descriptions to score higher when the same terms
> match the same number of times (the default behavior of lengthNorm)

> my udnerstanding is that you want documents that don't have a description
> to score lower then documents that do -- and you might be querying against
> completely differnet fields (description might not even be indexed)
>
> in that case there is no easy way to to achieve this with just the
> description field ... the easy thing to do is to index a boolean
> "has_description" field and then incorporate that into your query (or as
> the input to a function query)

You get my point Hoss. In my case long description = good value. And
your intuition is amazing ;-) I do have a field which is not used in
search at all (image url) but docs with image have for me greater value
than without it.

I would add two fields then (boolean for photo and int for description
length) fill them up during indexation and would play with them during
the search.

Thanks,
Tom

Reply | Threaded
Open this post in threaded view
|

Re: Boost document base on field length

Lance Norskog-2
The Lucene norms, if set, are 1/number of terms in the field.

I cannot find a function that makes norms available. Yo gurus- is this
impossible, a bad idea, or just an oversight?

On Tue, Nov 24, 2009 at 6:06 AM, Tomasz Kępski <[hidden email]> wrote:

> Hi,
>
>> I think i'm reading he question differently then Grant -- his suggestion
>> applies when you are searching in the description field, and don't want
>> documents with shorter descriptions to score higher when the same terms
>> match the same number of times (the default behavior of lengthNorm)
>
>> my udnerstanding is that you want documents that don't have a description
>> to score lower then documents that do -- and you might be querying against
>> completely differnet fields (description might not even be indexed)
>>
>> in that case there is no easy way to to achieve this with just the
>> description field ... the easy thing to do is to index a boolean
>> "has_description" field and then incorporate that into your query (or as the
>> input to a function query)
>
> You get my point Hoss. In my case long description = good value. And your
> intuition is amazing ;-) I do have a field which is not used in search at
> all (image url) but docs with image have for me greater value than without
> it.
>
> I would add two fields then (boolean for photo and int for description
> length) fill them up during indexation and would play with them during the
> search.
>
> Thanks,
> Tom
>
>



--
Lance Norskog
[hidden email]