Scoring Documentation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Scoring Documentation

Grant Ingersoll
Steve Rowe and I have added scoring.xml (with some contributions from  
Karl Wettin, Chris Hostetter and others) to the xdocs directory (and  
scoring.html to the docs directory).  Our goals in writing this  
document were:

1. To better understand scoring
2. To document how scoring works for the Lucene community, as well as  
document how to make changes to scoring for a specific application.
3. To kick start more documentation on scoring

I think we have achieved #1, which doesn't really benefit many others  
yet, as for #2, that remains to be seen.  #3 is up to us to do.

To the end of achieving #2, I would appreciate it if other developers  
could take a look at
and provide us feedback on any and all parts of this document.

Note, the above link is not yet hooked into the main menu system on  
the left hand side of the Lucene site.  In a week or two, once we  
have some feedback and updates, my plan is to hook it into the  
projects.xml menu under the menu title "Scoring".

Specifically, we are interested in:

1. Errata, clarifications, improvements, additions of things that are  
useful.  Where did we get the algorithms/descriptions wrong, where  
could it be made more clear?  Some of the areas of particular  
interest are those highlighted in yellow.  Additionally:
        a. Filling in the "Big Picture" section with lower level details on  
BooleanScorer2.  Is this necessary?
        b. Other examples of changing Similarity
        c. Examples of adding your own Query.  It would be great to have a  
write up on the motivation behind SpanQuery or some of the other  
Query classes (other than TermQuery).  Also would be great to have  
more on the semantics of what goes into implementing the various  
methods on Weight and Scorer
        d. Should there be more of a discussion about how Hits/Searchers/
Filters work?  I purposely left these out b/c I wanted to focus on  
scoring, but these pieces do play a role in enabling scoring
2. Organizational suggestions -- i.e how could this document be  
better organized
3. Grammar, spelling
4.  If anyone knows how to get the Greek Sigma character to pass  
through in Anakia (Velocity), the section on the scoring formula  
would be most appreciative.  The usual Hex entity references don't  
seem to pass through correctly.  I suspect there needs to be a change  
in the site.vsl but I don't know how to do it (there is also a Entity  
reference in systemproperties.xml that is not working correctly.)

As for goal #3, please feel free to add more insight into the scoring  
process, particularly if you can add value on the "why" question  
(i.e. why is scoring done this way.)  This document is most likely  
just a start on documenting how scoring works.

As for changes, the best way is to submit a patch in JIRA (or just  
commit the changes, if you can).  If not JIRA, then at least reply to  
this message.


Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]