0.7-dev, the search scoring

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

0.7-dev, the search scoring

Fredrik Andersson-2-2
Hey guys!

I just ported a lot of old 0.6 code to 0.7-dev/mapred. Lots of stuff
has changed I see! One thing I can't quite grasp though, is why the
Hit.getScore() has been removed in favour for the TopDocs-thingie
instead? I wrote a quick add-on to support getting the score straight
from the hit, which worked fine, but it would be nice to hear a reason
as to why the method was removed in the first place!

Also, are there any "secret" WIKIs, mailinglists, forums or similar
for the 0.7 development? Would be very interesting to see what's
cooking!

Greetings,
Fredrik
Reply | Threaded
Open this post in threaded view
|

Re: 0.7-dev, the search scoring

Erik Hatcher

On Jul 28, 2005, at 8:28 AM, Fredrik Andersson wrote:
> Also, are there any "secret" WIKIs, mailinglists, forums or similar
> for the 0.7 development?

No.  All discussions are in the open right here on nutch-dev and  
nutch-user.

     Erik



Reply | Threaded
Open this post in threaded view
|

Re: 0.7-dev, the search scoring

Doug Cutting-2
In reply to this post by Fredrik Andersson-2-2
Fredrik Andersson wrote:
> I just ported a lot of old 0.6 code to 0.7-dev/mapred. Lots of stuff
> has changed I see! One thing I can't quite grasp though, is why the
> Hit.getScore() has been removed in favour for the TopDocs-thingie
> instead?

Hit.getScore() was generalized to Hit.getSortValue() in order to support
sorting results by things other than score.  If you sort by score, as is
the default, then ((FloatWritable)Hit.getSortValue()).get() is the
score.  But if you sort by, e.g., a date string, then
((UTF8)Hit.getSortValue()).toString() is the date string sorted on, and
the score is unavailable.  Perhaps the score should be made available
regardless?

Doug
Reply | Threaded
Open this post in threaded view
|

Re: 0.7-dev, the search scoring

Fredrik Andersson-2-2
Ah God, I am stupid ... thanks for that, Doug! I must have a bad
coding day today : )

Fredrik

On 7/28/05, Doug Cutting <[hidden email]> wrote:

> Fredrik Andersson wrote:
> > I just ported a lot of old 0.6 code to 0.7-dev/mapred. Lots of stuff
> > has changed I see! One thing I can't quite grasp though, is why the
> > Hit.getScore() has been removed in favour for the TopDocs-thingie
> > instead?
>
> Hit.getScore() was generalized to Hit.getSortValue() in order to support
> sorting results by things other than score.  If you sort by score, as is
> the default, then ((FloatWritable)Hit.getSortValue()).get() is the
> score.  But if you sort by, e.g., a date string, then
> ((UTF8)Hit.getSortValue()).toString() is the date string sorted on, and
> the score is unavailable.  Perhaps the score should be made available
> regardless?
>
> Doug
>
Reply | Threaded
Open this post in threaded view
|

recursion: see recursion

em-13
What to do when encountering sites where nutch falls into recursion mode?

Currently I'm solving this by removing these sites with the regex filter,
but, is anything under development currently?

By recursion I mean nutch fetching
<sfdsdf>.com/<sth>/<sth>/<sth>/<sth>/<sth>/<sth> and on and on....

Any tricks to limis the folder depth in the fetch mode?

E.

Reply | Threaded
Open this post in threaded view
|

Re: recursion: see recursion

luti
Insert in top of the regex-urlfilter.txt:
-http://.*(/.+?)/.*?\1/.*?\1.*?/

EM wrotte:

>What to do when encountering sites where nutch falls into recursion mode?
>
>Currently I'm solving this by removing these sites with the regex filter,
>but, is anything under development currently?
>
>By recursion I mean nutch fetching
><sfdsdf>.com/<sth>/<sth>/<sth>/<sth>/<sth>/<sth> and on and on....
>
>Any tricks to limis the folder depth in the fetch mode?
>
>E.
>
>
>
>  
>