Setting query.host.boost etc. in nutch-site.xml does not work?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting query.host.boost etc. in nutch-site.xml does not work?

Stefan Neufeind
Hi,

I was experiencing a "strange" selection of search-results here. The
first idea was to rate the results with searchword in hostname higher.
So I set query.host.boost to quite a high value (50, later 200). But
nothing in the result changes.

Searching for the full hostname (www.example.com) does not give me any
search at all. Could it be that hostname is not taken into account
during a search?

What could be wrong here? Please help. *sigh*



Thanks a lot,
 Stefan
Reply | Threaded
Open this post in threaded view
|

Re: Setting query.host.boost etc. in nutch-site.xml does not work?

Marko Bauhardt-2
This is a bug in the query-basic plugin. The boosting values in the  
nutch-default.xml are not used.
We should open a bug in jira. This simple patch should work.

Index: src/plugin/query-basic/src/java/org/apache/nutch/searcher/
basic/BasicQueryFilter.java
===================================================================
--- src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/
BasicQueryFilter.java       (revision 405566)
+++ src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/
BasicQueryFilter.java       (working copy)
@@ -48,7 +48,7 @@
    private static final String[] FIELDS =
    { "url", "anchor", "content", "title", "host" };
-  private final float[] FIELD_BOOSTS =
+  private float[] FIELD_BOOSTS =
    { URL_BOOST, ANCHOR_BOOST, 1.0f, TITLE_BOOST, HOST_BOOST };
    /**
@@ -178,6 +178,7 @@
      this.TITLE_BOOST = conf.getFloat("query.title.boost", 1.5f);
      this.HOST_BOOST = conf.getFloat("query.host.boost", 2.0f);
      this.PHRASE_BOOST = conf.getFloat("query.phrase.boost", 1.0f);
+    FIELD_BOOSTS = new float[]{ URL_BOOST, ANCHOR_BOOST, 1.0f,  
TITLE_BOOST, HOST_BOOST };
    }
    public Configuration getConf() {


Marko



Am 22.05.2006 um 22:07 schrieb Stefan Neufeind:

> Hi,
>
> I was experiencing a "strange" selection of search-results here. The
> first idea was to rate the results with searchword in hostname higher.
> So I set query.host.boost to quite a high value (50, later 200). But
> nothing in the result changes.
>
> Searching for the full hostname (www.example.com) does not give me any
> search at all. Could it be that hostname is not taken into account
> during a search?
>
> What could be wrong here? Please help. *sigh*
>
>
>
> Thanks a lot,
>  Stefan
>

Reply | Threaded
Open this post in threaded view
|

Re: Setting query.host.boost etc. in nutch-site.xml does not work?

Andrzej Białecki-2
Marko Bauhardt wrote:
> This is a bug in the query-basic plugin. The boosting values in the
> nutch-default.xml are not used.
> We should open a bug in jira. This simple patch should work.
>

Fixed, in a slightly different way. Thanks!

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Setting query.host.boost etc. in nutch-site.xml does not work?

Stefan Neufeind
In reply to this post by Marko Bauhardt-2
Wow Marko, that was damn quick. I didn't recognise the error, though I
looked into the sources briefly.

Thanks to you for finding the bug - and finding it in such few time. You
made my day!

And also thanks to Andrzej for putting a fix in the trunk already:
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/BasicQueryFilter.java?r1=383304&r2=408767&pathrev=408767


Thank you,
 Stefan

Marko Bauhardt wrote:
> This is a bug in the query-basic plugin. The boosting values in the
> nutch-default.xml are not used.
> We should open a bug in jira. This simple patch should work.
>
> Index:
> src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/BasicQueryFilter.java
>

[...]

> Am 22.05.2006 um 22:07 schrieb Stefan Neufeind:
>
>> Hi,
>>
>> I was experiencing a "strange" selection of search-results here. The
>> first idea was to rate the results with searchword in hostname higher.
>> So I set query.host.boost to quite a high value (50, later 200). But
>> nothing in the result changes.
>>
>> Searching for the full hostname (www.example.com) does not give me any
>> search at all. Could it be that hostname is not taken into account
>> during a search?
>>
>> What could be wrong here? Please help. *sigh*