Index-time vs. search-time boosting performance

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Index-time vs. search-time boosting performance

Asif Rahman
Hi,

What are the performance ramifications for using a function-based boost at
search time (through bf in dismax parser) versus an index-time boost?
Currently I'm using boost functions on a 15GB index of ~14mm documents.  Our
queries generally match many thousands of documents.  I'm wondering if I
would see a performance improvement by switching over to index-time
boosting.

Thanks,

Asif

--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Erick Erickson
Index time boosting is different than search time boosting, so
asking about performance is irrelevant.

Paraphrasing Hossman from years ago on the Lucene list (from
memory).

...index time boosting is a way of saying this documents'
title is more important than other documents' titles. Search
time boosting is a way of saying "I care about documents
whose titles contain this term more than other documents
whose titles may match other parts of this query"....

HTH
Erick

On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]> wrote:

> Hi,
>
> What are the performance ramifications for using a function-based boost at
> search time (through bf in dismax parser) versus an index-time boost?
> Currently I'm using boost functions on a 15GB index of ~14mm documents.
>  Our
> queries generally match many thousands of documents.  I'm wondering if I
> would see a performance improvement by switching over to index-time
> boosting.
>
> Thanks,
>
> Asif
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> [hidden email]
> http://platform.newscred.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Asif Rahman
Perhaps I should have been more specific in my initial post.  I'm doing
date-based boosting on the documents in my index, so as to assign a higher
score to more recent documents.  Currently I'm using a boost function to
achieve this.  I'm wondering if there would be a performance improvement if
instead of using the boost function at search time, I indexed the documents
with a date-based boost.

On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson <[hidden email]>wrote:

> Index time boosting is different than search time boosting, so
> asking about performance is irrelevant.
>
> Paraphrasing Hossman from years ago on the Lucene list (from
> memory).
>
> ...index time boosting is a way of saying this documents'
> title is more important than other documents' titles. Search
> time boosting is a way of saying "I care about documents
> whose titles contain this term more than other documents
> whose titles may match other parts of this query"....
>
> HTH
> Erick
>
> On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]> wrote:
>
> > Hi,
> >
> > What are the performance ramifications for using a function-based boost
> at
> > search time (through bf in dismax parser) versus an index-time boost?
> > Currently I'm using boost functions on a 15GB index of ~14mm documents.
> >  Our
> > queries generally match many thousands of documents.  I'm wondering if I
> > would see a performance improvement by switching over to index-time
> > boosting.
> >
> > Thanks,
> >
> > Asif
> >
> > --
> > Asif Rahman
> > Lead Engineer - NewsCred
> > [hidden email]
> > http://platform.newscred.com
> >
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Jay Hill
I've done a lot of recency boosting to documents, and I'm wondering why you
would want to do that at index time. If you are continuously indexing new
documents, what was "recent" when it was indexed becomes, over time "less
recent". Are you unsatisfied with your current performance with the boost
function? Query-time recency boosting is a fairly common thing to do, and,
if done correctly, shouldn't be a performance concern.

-Jay
http://lucidimagination.com


On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman <[hidden email]> wrote:

> Perhaps I should have been more specific in my initial post.  I'm doing
> date-based boosting on the documents in my index, so as to assign a higher
> score to more recent documents.  Currently I'm using a boost function to
> achieve this.  I'm wondering if there would be a performance improvement if
> instead of using the boost function at search time, I indexed the documents
> with a date-based boost.
>
> On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson <[hidden email]
> >wrote:
>
> > Index time boosting is different than search time boosting, so
> > asking about performance is irrelevant.
> >
> > Paraphrasing Hossman from years ago on the Lucene list (from
> > memory).
> >
> > ...index time boosting is a way of saying this documents'
> > title is more important than other documents' titles. Search
> > time boosting is a way of saying "I care about documents
> > whose titles contain this term more than other documents
> > whose titles may match other parts of this query"....
> >
> > HTH
> > Erick
> >
> > On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > What are the performance ramifications for using a function-based boost
> > at
> > > search time (through bf in dismax parser) versus an index-time boost?
> > > Currently I'm using boost functions on a 15GB index of ~14mm documents.
> > >  Our
> > > queries generally match many thousands of documents.  I'm wondering if
> I
> > > would see a performance improvement by switching over to index-time
> > > boosting.
> > >
> > > Thanks,
> > >
> > > Asif
> > >
> > > --
> > > Asif Rahman
> > > Lead Engineer - NewsCred
> > > [hidden email]
> > > http://platform.newscred.com
> > >
> >
>
>
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> [hidden email]
> http://platform.newscred.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Asif Rahman
It seems like it would be far more efficient to calculate the boost factor
once and store it rather than calculating it for each request in real-time.
Some of our queries match tens of thousands if not hundreds of thousands of
documents in a 15GB index.  However, I'm not well-versed in lucene internals
so I may be misunderstanding what is going on here.


On Fri, Jun 4, 2010 at 8:31 PM, Jay Hill <[hidden email]> wrote:

> I've done a lot of recency boosting to documents, and I'm wondering why you
> would want to do that at index time. If you are continuously indexing new
> documents, what was "recent" when it was indexed becomes, over time "less
> recent". Are you unsatisfied with your current performance with the boost
> function? Query-time recency boosting is a fairly common thing to do, and,
> if done correctly, shouldn't be a performance concern.
>
> -Jay
> http://lucidimagination.com
>
>
> On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman <[hidden email]> wrote:
>
> > Perhaps I should have been more specific in my initial post.  I'm doing
> > date-based boosting on the documents in my index, so as to assign a
> higher
> > score to more recent documents.  Currently I'm using a boost function to
> > achieve this.  I'm wondering if there would be a performance improvement
> if
> > instead of using the boost function at search time, I indexed the
> documents
> > with a date-based boost.
> >
> > On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson <[hidden email]
> > >wrote:
> >
> > > Index time boosting is different than search time boosting, so
> > > asking about performance is irrelevant.
> > >
> > > Paraphrasing Hossman from years ago on the Lucene list (from
> > > memory).
> > >
> > > ...index time boosting is a way of saying this documents'
> > > title is more important than other documents' titles. Search
> > > time boosting is a way of saying "I care about documents
> > > whose titles contain this term more than other documents
> > > whose titles may match other parts of this query"....
> > >
> > > HTH
> > > Erick
> > >
> > > On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]> wrote:
> > >
> > > > Hi,
> > > >
> > > > What are the performance ramifications for using a function-based
> boost
> > > at
> > > > search time (through bf in dismax parser) versus an index-time boost?
> > > > Currently I'm using boost functions on a 15GB index of ~14mm
> documents.
> > > >  Our
> > > > queries generally match many thousands of documents.  I'm wondering
> if
> > I
> > > > would see a performance improvement by switching over to index-time
> > > > boosting.
> > > >
> > > > Thanks,
> > > >
> > > > Asif
> > > >
> > > > --
> > > > Asif Rahman
> > > > Lead Engineer - NewsCred
> > > > [hidden email]
> > > > http://platform.newscred.com
> > > >
> > >
> >
> >
> >
> > --
> > Asif Rahman
> > Lead Engineer - NewsCred
> > [hidden email]
> > http://platform.newscred.com
> >
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

RE: Index-time vs. search-time boosting performance

Jonathan Rochkind
The SolrRelevancyFAQ does suggest that both index-time and search-time boosting can be used to boost the score of newer documents, but doesn't suggest what reasons/contexts one might choose one vs the other.  It only provides an example of search-time boost though, so it doesn't answer the question of how to do an index time boost, if that was a question.

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Sorry, this doesn't answer your question, but does contribute the fact that some author of the FAQ at some point considered index-time boost not neccesarily unreasonable.
________________________________________
From: Asif Rahman [[hidden email]]
Sent: Friday, June 04, 2010 11:31 PM
To: [hidden email]
Subject: Re: Index-time vs. search-time boosting performance

It seems like it would be far more efficient to calculate the boost factor
once and store it rather than calculating it for each request in real-time.
Some of our queries match tens of thousands if not hundreds of thousands of
documents in a 15GB index.  However, I'm not well-versed in lucene internals
so I may be misunderstanding what is going on here.


On Fri, Jun 4, 2010 at 8:31 PM, Jay Hill <[hidden email]> wrote:

> I've done a lot of recency boosting to documents, and I'm wondering why you
> would want to do that at index time. If you are continuously indexing new
> documents, what was "recent" when it was indexed becomes, over time "less
> recent". Are you unsatisfied with your current performance with the boost
> function? Query-time recency boosting is a fairly common thing to do, and,
> if done correctly, shouldn't be a performance concern.
>
> -Jay
> http://lucidimagination.com
>
>
> On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman <[hidden email]> wrote:
>
> > Perhaps I should have been more specific in my initial post.  I'm doing
> > date-based boosting on the documents in my index, so as to assign a
> higher
> > score to more recent documents.  Currently I'm using a boost function to
> > achieve this.  I'm wondering if there would be a performance improvement
> if
> > instead of using the boost function at search time, I indexed the
> documents
> > with a date-based boost.
> >
> > On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson <[hidden email]
> > >wrote:
> >
> > > Index time boosting is different than search time boosting, so
> > > asking about performance is irrelevant.
> > >
> > > Paraphrasing Hossman from years ago on the Lucene list (from
> > > memory).
> > >
> > > ...index time boosting is a way of saying this documents'
> > > title is more important than other documents' titles. Search
> > > time boosting is a way of saying "I care about documents
> > > whose titles contain this term more than other documents
> > > whose titles may match other parts of this query"....
> > >
> > > HTH
> > > Erick
> > >
> > > On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]> wrote:
> > >
> > > > Hi,
> > > >
> > > > What are the performance ramifications for using a function-based
> boost
> > > at
> > > > search time (through bf in dismax parser) versus an index-time boost?
> > > > Currently I'm using boost functions on a 15GB index of ~14mm
> documents.
> > > >  Our
> > > > queries generally match many thousands of documents.  I'm wondering
> if
> > I
> > > > would see a performance improvement by switching over to index-time
> > > > boosting.
> > > >
> > > > Thanks,
> > > >
> > > > Asif
> > > >
> > > > --
> > > > Asif Rahman
> > > > Lead Engineer - NewsCred
> > > > [hidden email]
> > > > http://platform.newscred.com
> > > >
> > >
> >
> >
> >
> > --
> > Asif Rahman
> > Lead Engineer - NewsCred
> > [hidden email]
> > http://platform.newscred.com
> >
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Asif Rahman
I know how to index a document with a boost but am still not sure whether
I'll see a search performance improvement with it.  The initial decision to
use a boost function at search-time was made to preserve the flexibility to
tweak the function without having to a full reindex.  I no longer need that
flexibility so was wondering if I would get better performance by implement
the boost at index-time.


On Fri, Jun 4, 2010 at 11:48 PM, Jonathan Rochkind <[hidden email]> wrote:

> The SolrRelevancyFAQ does suggest that both index-time and search-time
> boosting can be used to boost the score of newer documents, but doesn't
> suggest what reasons/contexts one might choose one vs the other.  It only
> provides an example of search-time boost though, so it doesn't answer the
> question of how to do an index time boost, if that was a question.
>
>
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>
> Sorry, this doesn't answer your question, but does contribute the fact that
> some author of the FAQ at some point considered index-time boost not
> neccesarily unreasonable.
> ________________________________________
> From: Asif Rahman [[hidden email]]
> Sent: Friday, June 04, 2010 11:31 PM
> To: [hidden email]
> Subject: Re: Index-time vs. search-time boosting performance
>
> It seems like it would be far more efficient to calculate the boost factor
> once and store it rather than calculating it for each request in real-time.
> Some of our queries match tens of thousands if not hundreds of thousands of
> documents in a 15GB index.  However, I'm not well-versed in lucene
> internals
> so I may be misunderstanding what is going on here.
>
>
> On Fri, Jun 4, 2010 at 8:31 PM, Jay Hill <[hidden email]> wrote:
>
> > I've done a lot of recency boosting to documents, and I'm wondering why
> you
> > would want to do that at index time. If you are continuously indexing new
> > documents, what was "recent" when it was indexed becomes, over time "less
> > recent". Are you unsatisfied with your current performance with the boost
> > function? Query-time recency boosting is a fairly common thing to do,
> and,
> > if done correctly, shouldn't be a performance concern.
> >
> > -Jay
> > http://lucidimagination.com
> >
> >
> > On Fri, Jun 4, 2010 at 4:50 PM, Asif Rahman <[hidden email]> wrote:
> >
> > > Perhaps I should have been more specific in my initial post.  I'm doing
> > > date-based boosting on the documents in my index, so as to assign a
> > higher
> > > score to more recent documents.  Currently I'm using a boost function
> to
> > > achieve this.  I'm wondering if there would be a performance
> improvement
> > if
> > > instead of using the boost function at search time, I indexed the
> > documents
> > > with a date-based boost.
> > >
> > > On Fri, Jun 4, 2010 at 7:30 PM, Erick Erickson <
> [hidden email]
> > > >wrote:
> > >
> > > > Index time boosting is different than search time boosting, so
> > > > asking about performance is irrelevant.
> > > >
> > > > Paraphrasing Hossman from years ago on the Lucene list (from
> > > > memory).
> > > >
> > > > ...index time boosting is a way of saying this documents'
> > > > title is more important than other documents' titles. Search
> > > > time boosting is a way of saying "I care about documents
> > > > whose titles contain this term more than other documents
> > > > whose titles may match other parts of this query"....
> > > >
> > > > HTH
> > > > Erick
> > > >
> > > > On Fri, Jun 4, 2010 at 5:10 PM, Asif Rahman <[hidden email]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > What are the performance ramifications for using a function-based
> > boost
> > > > at
> > > > > search time (through bf in dismax parser) versus an index-time
> boost?
> > > > > Currently I'm using boost functions on a 15GB index of ~14mm
> > documents.
> > > > >  Our
> > > > > queries generally match many thousands of documents.  I'm wondering
> > if
> > > I
> > > > > would see a performance improvement by switching over to index-time
> > > > > boosting.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asif
> > > > >
> > > > > --
> > > > > Asif Rahman
> > > > > Lead Engineer - NewsCred
> > > > > [hidden email]
> > > > > http://platform.newscred.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Asif Rahman
> > > Lead Engineer - NewsCred
> > > [hidden email]
> > > http://platform.newscred.com
> > >
> >
>
>
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> [hidden email]
> http://platform.newscred.com
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Robert Muir
In reply to this post by Asif Rahman
On Fri, Jun 4, 2010 at 7:50 PM, Asif Rahman <[hidden email]> wrote:

> Perhaps I should have been more specific in my initial post.  I'm doing
> date-based boosting on the documents in my index, so as to assign a higher
> score to more recent documents.  Currently I'm using a boost function to
> achieve this.  I'm wondering if there would be a performance improvement if
> instead of using the boost function at search time, I indexed the documents
> with a date-based boost.
>
>
Asif, without knowing more details, before you look at performance you might
want to consider the relevance impacts of switching to index-time boosting
for your use case too.

You can read more about the differences here:
http://lucene.apache.org/java/3_0_1/scoring.html

But I think the most important for this date-influenced use case is:

"Indexing time boosts are preprocessed for storage efficiency and written to
the directory (when writing the document) in a single byte (!)"

If you do this as an index-time boost, your boosts will lose lots of
precision for this reason.

--
Robert Muir
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Asif Rahman
Thanks everyone for your help so far.  I'm still trying to get to the bottom
of whether switching over to index-time boosts will give me a performance
improvement, and if so if it will be noticeable.  This is all under the
assumption that I can achieve the scoring functionality that I need with
either index-time or search-time boosting (given the loss of precision.  I
can always dust off the old profiler to see what's going on with the
search-time boosts, but testing the index-time boosts will require a full
reindex, which could take days with our dataset.

On Sat, Jun 5, 2010 at 9:17 AM, Robert Muir <[hidden email]> wrote:

> On Fri, Jun 4, 2010 at 7:50 PM, Asif Rahman <[hidden email]> wrote:
>
> > Perhaps I should have been more specific in my initial post.  I'm doing
> > date-based boosting on the documents in my index, so as to assign a
> higher
> > score to more recent documents.  Currently I'm using a boost function to
> > achieve this.  I'm wondering if there would be a performance improvement
> if
> > instead of using the boost function at search time, I indexed the
> documents
> > with a date-based boost.
> >
> >
> Asif, without knowing more details, before you look at performance you
> might
> want to consider the relevance impacts of switching to index-time boosting
> for your use case too.
>
> You can read more about the differences here:
> http://lucene.apache.org/java/3_0_1/scoring.html
>
> But I think the most important for this date-influenced use case is:
>
> "Indexing time boosts are preprocessed for storage efficiency and written
> to
> the directory (when writing the document) in a single byte (!)"
>
> If you do this as an index-time boost, your boosts will lose lots of
> precision for this reason.
>
> --
> Robert Muir
> [hidden email]
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Lance Norskog-2
If you are unhappy with the performance overhead of a function boost,
you can push it into a field query by boosting date ranges.

You would group in date ranges: documents in September would be
boosted 1.0, October 2.0, November 3.0 etc.


On 6/5/10, Asif Rahman <[hidden email]> wrote:

> Thanks everyone for your help so far.  I'm still trying to get to the bottom
> of whether switching over to index-time boosts will give me a performance
> improvement, and if so if it will be noticeable.  This is all under the
> assumption that I can achieve the scoring functionality that I need with
> either index-time or search-time boosting (given the loss of precision.  I
> can always dust off the old profiler to see what's going on with the
> search-time boosts, but testing the index-time boosts will require a full
> reindex, which could take days with our dataset.
>
> On Sat, Jun 5, 2010 at 9:17 AM, Robert Muir <[hidden email]> wrote:
>
>> On Fri, Jun 4, 2010 at 7:50 PM, Asif Rahman <[hidden email]> wrote:
>>
>> > Perhaps I should have been more specific in my initial post.  I'm doing
>> > date-based boosting on the documents in my index, so as to assign a
>> higher
>> > score to more recent documents.  Currently I'm using a boost function to
>> > achieve this.  I'm wondering if there would be a performance improvement
>> if
>> > instead of using the boost function at search time, I indexed the
>> documents
>> > with a date-based boost.
>> >
>> >
>> Asif, without knowing more details, before you look at performance you
>> might
>> want to consider the relevance impacts of switching to index-time boosting
>> for your use case too.
>>
>> You can read more about the differences here:
>> http://lucene.apache.org/java/3_0_1/scoring.html
>>
>> But I think the most important for this date-influenced use case is:
>>
>> "Indexing time boosts are preprocessed for storage efficiency and written
>> to
>> the directory (when writing the document) in a single byte (!)"
>>
>> If you do this as an index-time boost, your boosts will lose lots of
>> precision for this reason.
>>
>> --
>> Robert Muir
>> [hidden email]
>>
>
>
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> [hidden email]
> http://platform.newscred.com
>


--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Asif Rahman
I still need a relatively precise boost.  No less precise than hourly.  I
think that would make for a pretty messy field query.


On Mon, Jun 7, 2010 at 2:15 AM, Lance Norskog <[hidden email]> wrote:

> If you are unhappy with the performance overhead of a function boost,
> you can push it into a field query by boosting date ranges.
>
> You would group in date ranges: documents in September would be
> boosted 1.0, October 2.0, November 3.0 etc.
>
>
> On 6/5/10, Asif Rahman <[hidden email]> wrote:
> > Thanks everyone for your help so far.  I'm still trying to get to the
> bottom
> > of whether switching over to index-time boosts will give me a performance
> > improvement, and if so if it will be noticeable.  This is all under the
> > assumption that I can achieve the scoring functionality that I need with
> > either index-time or search-time boosting (given the loss of precision.
>  I
> > can always dust off the old profiler to see what's going on with the
> > search-time boosts, but testing the index-time boosts will require a full
> > reindex, which could take days with our dataset.
> >
> > On Sat, Jun 5, 2010 at 9:17 AM, Robert Muir <[hidden email]> wrote:
> >
> >> On Fri, Jun 4, 2010 at 7:50 PM, Asif Rahman <[hidden email]> wrote:
> >>
> >> > Perhaps I should have been more specific in my initial post.  I'm
> doing
> >> > date-based boosting on the documents in my index, so as to assign a
> >> higher
> >> > score to more recent documents.  Currently I'm using a boost function
> to
> >> > achieve this.  I'm wondering if there would be a performance
> improvement
> >> if
> >> > instead of using the boost function at search time, I indexed the
> >> documents
> >> > with a date-based boost.
> >> >
> >> >
> >> Asif, without knowing more details, before you look at performance you
> >> might
> >> want to consider the relevance impacts of switching to index-time
> boosting
> >> for your use case too.
> >>
> >> You can read more about the differences here:
> >> http://lucene.apache.org/java/3_0_1/scoring.html
> >>
> >> But I think the most important for this date-influenced use case is:
> >>
> >> "Indexing time boosts are preprocessed for storage efficiency and
> written
> >> to
> >> the directory (when writing the document) in a single byte (!)"
> >>
> >> If you do this as an index-time boost, your boosts will lose lots of
> >> precision for this reason.
> >>
> >> --
> >> Robert Muir
> >> [hidden email]
> >>
> >
> >
> >
> > --
> > Asif Rahman
> > Lead Engineer - NewsCred
> > [hidden email]
> > http://platform.newscred.com
> >
>
>
> --
> Lance Norskog
> [hidden email]
>



--
Asif Rahman
Lead Engineer - NewsCred
[hidden email]
http://platform.newscred.com
Reply | Threaded
Open this post in threaded view
|

Re: Index-time vs. search-time boosting performance

Lance Norskog-2
Is it necessary that a document 1 year old be more relevant than one
that's 1 year and 1 hour old? In other words, can the boosting be
logarithmic wrt time instead of linear?

A schema design tip: you can store a separate date field which is
rounded down to the hour. This will make for a much smaller term
dictionary and therefore faster searching & range queries.

On Mon, Jun 7, 2010 at 4:08 AM, Asif Rahman <[hidden email]> wrote:

> I still need a relatively precise boost.  No less precise than hourly.  I
> think that would make for a pretty messy field query.
>
>
> On Mon, Jun 7, 2010 at 2:15 AM, Lance Norskog <[hidden email]> wrote:
>
>> If you are unhappy with the performance overhead of a function boost,
>> you can push it into a field query by boosting date ranges.
>>
>> You would group in date ranges: documents in September would be
>> boosted 1.0, October 2.0, November 3.0 etc.
>>
>>
>> On 6/5/10, Asif Rahman <[hidden email]> wrote:
>> > Thanks everyone for your help so far.  I'm still trying to get to the
>> bottom
>> > of whether switching over to index-time boosts will give me a performance
>> > improvement, and if so if it will be noticeable.  This is all under the
>> > assumption that I can achieve the scoring functionality that I need with
>> > either index-time or search-time boosting (given the loss of precision.
>>  I
>> > can always dust off the old profiler to see what's going on with the
>> > search-time boosts, but testing the index-time boosts will require a full
>> > reindex, which could take days with our dataset.
>> >
>> > On Sat, Jun 5, 2010 at 9:17 AM, Robert Muir <[hidden email]> wrote:
>> >
>> >> On Fri, Jun 4, 2010 at 7:50 PM, Asif Rahman <[hidden email]> wrote:
>> >>
>> >> > Perhaps I should have been more specific in my initial post.  I'm
>> doing
>> >> > date-based boosting on the documents in my index, so as to assign a
>> >> higher
>> >> > score to more recent documents.  Currently I'm using a boost function
>> to
>> >> > achieve this.  I'm wondering if there would be a performance
>> improvement
>> >> if
>> >> > instead of using the boost function at search time, I indexed the
>> >> documents
>> >> > with a date-based boost.
>> >> >
>> >> >
>> >> Asif, without knowing more details, before you look at performance you
>> >> might
>> >> want to consider the relevance impacts of switching to index-time
>> boosting
>> >> for your use case too.
>> >>
>> >> You can read more about the differences here:
>> >> http://lucene.apache.org/java/3_0_1/scoring.html
>> >>
>> >> But I think the most important for this date-influenced use case is:
>> >>
>> >> "Indexing time boosts are preprocessed for storage efficiency and
>> written
>> >> to
>> >> the directory (when writing the document) in a single byte (!)"
>> >>
>> >> If you do this as an index-time boost, your boosts will lose lots of
>> >> precision for this reason.
>> >>
>> >> --
>> >> Robert Muir
>> >> [hidden email]
>> >>
>> >
>> >
>> >
>> > --
>> > Asif Rahman
>> > Lead Engineer - NewsCred
>> > [hidden email]
>> > http://platform.newscred.com
>> >
>>
>>
>> --
>> Lance Norskog
>> [hidden email]
>>
>
>
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> [hidden email]
> http://platform.newscred.com
>



--
Lance Norskog
[hidden email]