boosting custom field values in scoring algorithm

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

boosting custom field values in scoring algorithm

Scott Owens
Has anyone been able to include a custom lucene field in the primary
ranking algo?  I have successfully set up a query plugin to search
field:value - but can't see how to include (hopefully boost) the value
in the main query.

I saw where Title and Host were added last April:
http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3cab7d6cde20c5fa8917168c1eeba68c4e@...%3e

However, I can't seem to figure out how to add a custom field.  Is it
possible?  Any help would be great!

Thanks.
Scott
Reply | Threaded
Open this post in threaded view
|

RE: boosting custom field values in scoring algorithm

Vanderdray, Jacob
Scott,

        You probably just need to edit your plugin.xml file to set
fields="DEFAULT".  Take a look at
http://wiki.apache.org/nutch/WritingPluginExample and see if that helps.

Jake.

-----Original Message-----
From: Scott Owens [mailto:[hidden email]]
Sent: Wednesday, February 08, 2006 10:08 AM
To: [hidden email]
Subject: boosting custom field values in scoring algorithm

Has anyone been able to include a custom lucene field in the primary
ranking algo?  I have successfully set up a query plugin to search
field:value - but can't see how to include (hopefully boost) the value
in the main query.

I saw where Title and Host were added last April:
<a href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3">http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3
[hidden email]%3e

However, I can't seem to figure out how to add a custom field.  Is it
possible?  Any help would be great!

Thanks.
Scott
Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: boosting custom field values in scoring algorithm

Scott Owens
That's perfect Jake.  Thanks!  Should have caught that.

Quick (maybe stupid) question.  Will I still be able to run a query
for field:value  once i set fields="DEFAULT" ?

Or do I need to create a seperate plugin for that - basically exactly
what i had before with fields="fieldname" in the plugin.xml?

Thanks a ton.
Scott



On 2/8/06, Vanderdray, Jacob <[hidden email]> wrote:

> Scott,
>
>         You probably just need to edit your plugin.xml file to set
> fields="DEFAULT".  Take a look at
> http://wiki.apache.org/nutch/WritingPluginExample and see if that helps.
>
> Jake.
>
> -----Original Message-----
> From: Scott Owens [mailto:[hidden email]]
> Sent: Wednesday, February 08, 2006 10:08 AM
> To: [hidden email]
> Subject: boosting custom field values in scoring algorithm
>
> Has anyone been able to include a custom lucene field in the primary
> ranking algo?  I have successfully set up a query plugin to search
> field:value - but can't see how to include (hopefully boost) the value
> in the main query.
>
> I saw where Title and Host were added last April:
> <a href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3">http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3
> [hidden email]%3e
>
> However, I can't seem to figure out how to add a custom field.  Is it
> possible?  Any help would be great!
>
> Thanks.
> Scott
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> <a href="http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
> _______________________________________________
> Nutch-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>
Reply | Threaded
Open this post in threaded view
|

RE: [Nutch-general] RE: boosting custom field values in scoring algorithm

Vanderdray, Jacob
In reply to this post by Scott Owens
        To be honest I've been wondering that myself, but haven't tested
it. It seems like you could just add more lines to the plugin.xml file
so that the same code gets used as both fields="DEFAULT" and
fields="[fieldname]".  

        It's also possible that you could set fields to a coma separated
list.  I haven't dug into the code enough to know if that would work.
If you figure it out, let me know. :)

Thanks,
Jake.

-----Original Message-----
From: Scott Owens [mailto:[hidden email]]
Sent: Wednesday, February 08, 2006 10:59 AM
To: [hidden email]
Subject: Re: [Nutch-general] RE: boosting custom field values in scoring
algorithm

That's perfect Jake.  Thanks!  Should have caught that.

Quick (maybe stupid) question.  Will I still be able to run a query
for field:value  once i set fields="DEFAULT" ?

Or do I need to create a seperate plugin for that - basically exactly
what i had before with fields="fieldname" in the plugin.xml?

Thanks a ton.
Scott



On 2/8/06, Vanderdray, Jacob <[hidden email]> wrote:
> Scott,
>
>         You probably just need to edit your plugin.xml file to set
> fields="DEFAULT".  Take a look at
> http://wiki.apache.org/nutch/WritingPluginExample and see if that
helps.

>
> Jake.
>
> -----Original Message-----
> From: Scott Owens [mailto:[hidden email]]
> Sent: Wednesday, February 08, 2006 10:08 AM
> To: [hidden email]
> Subject: boosting custom field values in scoring algorithm
>
> Has anyone been able to include a custom lucene field in the primary
> ranking algo?  I have successfully set up a query plugin to search
> field:value - but can't see how to include (hopefully boost) the value
> in the main query.
>
> I saw where Title and Host were added last April:
>
<a href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3">http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3

> [hidden email]%3e
>
> However, I can't seem to figure out how to add a custom field.  Is it
> possible?  Any help would be great!
>
> Thanks.
> Scott
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
> <a href="http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
> _______________________________________________
> Nutch-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>
Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: boosting custom field values in scoring algorithm

Daniel Iversen-2
In reply to this post by Scott Owens
So, how did you manage to search on custom fields? ;)

I am really interested to know how... I've read lots of Api docs but no luck
(query.addRequiredTem("value","field") does NOT seem to do the trick).

I am sure people do this all the time.

Thanks in advance for the help.

Daniel


On 2/9/06, Scott Owens <[hidden email]> wrote:

>
> That's perfect Jake.  Thanks!  Should have caught that.
>
> Quick (maybe stupid) question.  Will I still be able to run a query
> for field:value  once i set fields="DEFAULT" ?
>
> Or do I need to create a seperate plugin for that - basically exactly
> what i had before with fields="fieldname" in the plugin.xml?
>
> Thanks a ton.
> Scott
>
>
>
> On 2/8/06, Vanderdray, Jacob <[hidden email]> wrote:
> > Scott,
> >
> >         You probably just need to edit your plugin.xml file to set
> > fields="DEFAULT".  Take a look at
> > http://wiki.apache.org/nutch/WritingPluginExample and see if that helps.
> >
> > Jake.
> >
> > -----Original Message-----
> > From: Scott Owens [mailto:[hidden email]]
> > Sent: Wednesday, February 08, 2006 10:08 AM
> > To: [hidden email]
> > Subject: boosting custom field values in scoring algorithm
> >
> > Has anyone been able to include a custom lucene field in the primary
> > ranking algo?  I have successfully set up a query plugin to search
> > field:value - but can't see how to include (hopefully boost) the value
> > in the main query.
> >
> > I saw where Title and Host were added last April:
> > <a href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3">http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3
> > [hidden email]%3e
> >
> > However, I can't seem to figure out how to add a custom field.  Is it
> > possible?  Any help would be great!
> >
> > Thanks.
> > Scott
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files
> > for problems?  Stop!  Download the new AJAX search engine that makes
> > searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> > <a href="http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
> > _______________________________________________
> > Nutch-general mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: [Nutch-general] RE: boosting custom field values in scoring algorithm

Vanderdray, Jacob
In reply to this post by Scott Owens
Daniel,

        I understand these issues better than I used to.  With the .7
branch it's pretty easy to add a raw query filter to be able to force
search results to include a value in an additional field (field:value).

        It gets very tricky if you want to actually add a new field to
the basic/default query.  So for instance if you want to have all
queries go against your custom field in the same way they do against the
default fields (content, url, etc.), you'll need to modify or replace
query-basic.

        That makes sense when you realize that the basic query requires
that each of the search terms exist in at least one of the default
fields.  All other filters get added on after that requirement.  So if
the terms searched for don't all exist in the default fields, it doesn't
matter if they do exist in your custom field.

        When working with trying to change the behavior of queries, I
found it extremely helpful to add logging to query-basic so that the
actual query gets written to the log file.  There's a toString method on
the query object that lets you do that.

        I know that may not answer your question, but I hope it helps.
I haven't worked with the .8 branch, so it may be easier to manipulate
the default query there.

Jake.

PS:  I've written a replacement for query-basic that allows you to add
custom fields into the default query by listing them in the
nutch-site.conf file.  It isn't perfect, but if you want to see it for a
reference, let me know.  I warped my mind a bit getting to understand
how all the term and phrase queries needed to get stacked up to do what
I wanted.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Daniel Iversen
Sent: Friday, April 07, 2006 1:07 AM
To: [hidden email]
Subject: Re: [Nutch-general] RE: boosting custom field values in scoring
algorithm

So, how did you manage to search on custom fields? ;)

I am really interested to know how... I've read lots of Api docs but no
luck
(query.addRequiredTem("value","field") does NOT seem to do the trick).

I am sure people do this all the time.

Thanks in advance for the help.

Daniel


On 2/9/06, Scott Owens <[hidden email]> wrote:

>
> That's perfect Jake.  Thanks!  Should have caught that.
>
> Quick (maybe stupid) question.  Will I still be able to run a query
> for field:value  once i set fields="DEFAULT" ?
>
> Or do I need to create a seperate plugin for that - basically exactly
> what i had before with fields="fieldname" in the plugin.xml?
>
> Thanks a ton.
> Scott
>
>
>
> On 2/8/06, Vanderdray, Jacob <[hidden email]> wrote:
> > Scott,
> >
> >         You probably just need to edit your plugin.xml file to set
> > fields="DEFAULT".  Take a look at
> > http://wiki.apache.org/nutch/WritingPluginExample and see if that
helps.

> >
> > Jake.
> >
> > -----Original Message-----
> > From: Scott Owens [mailto:[hidden email]]
> > Sent: Wednesday, February 08, 2006 10:08 AM
> > To: [hidden email]
> > Subject: boosting custom field values in scoring algorithm
> >
> > Has anyone been able to include a custom lucene field in the primary
> > ranking algo?  I have successfully set up a query plugin to search
> > field:value - but can't see how to include (hopefully boost) the
value
> > in the main query.
> >
> > I saw where Title and Host were added last April:
> >
<a href="http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3">http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200504.mbox/%3
> > [hidden email]%3e
> >
> > However, I can't seem to figure out how to add a custom field.  Is
it
> > possible?  Any help would be great!
> >
> > Thanks.
> > Scott
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc. Do you grep through
log
> files
> > for problems?  Stop!  Download the new AJAX search engine that makes
> > searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
> > <a href="http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642">http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
> > _______________________________________________
> > Nutch-general mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: boosting custom field values in scoring algorithm

Doug Cutting
Vanderdray, Jacob wrote:
> PS:  I've written a replacement for query-basic that allows you to add
> custom fields into the default query by listing them in the
> nutch-site.conf file.  It isn't perfect, but if you want to see it for a
> reference, let me know.  I warped my mind a bit getting to understand
> how all the term and phrase queries needed to get stacked up to do what
> I wanted.

Please post this as a patch, even if you don't think it's ready to be
committed.  That way others can try it, and perhaps someone can improve
it so that it can be committed.

Doug
vis
Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: boosting custom field values in scoring algorithm

vis
In reply to this post by Scott Owens
Sorry, I am on holiday until the 8th of May.

Please contact the [hidden email] for urgent matters.

Kind regards, Herman.

Reply | Threaded
Open this post in threaded view
|

RE: [Nutch-general] RE: boosting custom field values in scoring algorithm

Vanderdray, Jacob
In reply to this post by Scott Owens
Doug,

        I've posted my plugins as
http://issues.apache.org/jira/browse/NUTCH-260.  I tried to give enough
documentation to allow someone else to use them.  If anyone wants to
check them out and has questions, I'm happy to try to answer.

Thanks,
Jake.

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Friday, April 28, 2006 3:14 PM
To: [hidden email]
Subject: Re: [Nutch-general] RE: boosting custom field values in scoring
algorithm

Vanderdray, Jacob wrote:
> PS:  I've written a replacement for query-basic that allows you to add
> custom fields into the default query by listing them in the
> nutch-site.conf file.  It isn't perfect, but if you want to see it for
a
> reference, let me know.  I warped my mind a bit getting to understand
> how all the term and phrase queries needed to get stacked up to do
what
> I wanted.

Please post this as a patch, even if you don't think it's ready to be
committed.  That way others can try it, and perhaps someone can improve
it so that it can be committed.

Doug
vis
Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: boosting custom field values in scoring algorithm

vis
In reply to this post by Scott Owens
Sorry, I am on holiday until the 8th of May.

Please contact the [hidden email] for urgent matters.

Kind regards, Herman.