about flexing ranking module in lucene

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

about flexing ranking module in lucene

Li Li
hi all,
    In current lucene versions(2.x/3.x) , we can hardly modify the scoring of documents because originally lucene adopt the VSM model and "matching phase" and "ranking phase" are integrated.
    But In many situation, we usually use complicated boolean query to "filter out" unrelated documents and score them by complicated business logic. 
    http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seems interesting. what's the status of this branch? will it be included in lucene4 release? 
Reply | Threaded
Open this post in threaded view
|

Re: about flexing ranking module in lucene

Robert Muir
On Thu, Sep 1, 2011 at 4:17 AM, Li Li <[hidden email]> wrote:

> hi all,
>     In current lucene versions(2.x/3.x) , we can hardly modify the scoring
> of documents because originally lucene adopt the VSM model and "matching
> phase" and "ranking phase" are integrated.
>     But In many situation, we usually use complicated boolean query to
> "filter out" unrelated documents and score them by complicated business
> logic.
>     http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seems
> interesting. what's the status of this branch? will it be included in
> lucene4 release?

Hi, its very close. there are some nocommits still in the branch right
now, once these are fixed we will look at merging to trunk.


--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: about flexing ranking module in lucene

David Nemeskey
Hi,

> >   http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seem
> > s interesting. what's the status of this branch? will it be included in
> > lucene4 release?
>
> Hi, its very close. there are some nocommits still in the branch right
> now, once these are fixed we will look at merging to trunk.
I've checked the nocommits in the similarities package, and it seems to me
that there is only one that is really no-worky (the phrase df). The rest are
about modifications to a few DFR models that are suboptimal, but they work
nevertheless.

Robert: I figured I'd take a week out for a much needed rest (not), what about
getting back on this on Monday?

David

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: about flexing ranking module in lucene

Robert Muir
On Fri, Sep 2, 2011 at 5:37 AM, David Nemeskey <[hidden email]> wrote:

> Hi,
>
>> >   http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seem
>> > s interesting. what's the status of this branch? will it be included in
>> > lucene4 release?
>>
>> Hi, its very close. there are some nocommits still in the branch right
>> now, once these are fixed we will look at merging to trunk.
> I've checked the nocommits in the similarities package, and it seems to me
> that there is only one that is really no-worky (the phrase df). The rest are
> about modifications to a few DFR models that are suboptimal, but they work
> nevertheless.
>

thats true: but they do also cause other unexpected things when the
"bounds" are exceeded: e.g. boosting a document up might lower its
score, keeping stopwords in your index is a disaster, etc.

This is because then these stopwords violate the relation that F << N.

This is pretty annoying for practical reasons!  This also means some
of lucene's tests will actually fail if this sim is used... sure we
can disable that particular model from being used in all tests, but
that's not great. I like the idea of rotating all the similarities in
all of lucene's tests, swapping the sims into the tests this way has
found a lot of little issues so far!

> Robert: I figured I'd take a week out for a much needed rest (not), what about
> getting back on this on Monday?
>

enjoy your rest... very well deserved! I'll keep testing and looking
for things and see if I can't find a better solution to the binomial
model (P/D), its the only DFR one left with issues.

I might not be able to help you on monday, its a holiday here and I
will be returning from the river... not sure what time I will make it
back to a computer that day. but please don't let that stop you from
tacking a crack at it!

--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]