influencing the page scores

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

influencing the page scores

Edward Quick


I'm trying to do 2 things:

i) Stop the intranet index page coming top in all searches.

I've read the FAQ which says
You can tweak your conf/common-terms.utf8 file after creating an index through the following command:
bin/nutch org.apache.nutch.indexer.HighFreqTerms -count 10 -nofreqs index

I added


to common-terms.utf8 and recreated the index by running:

bin/nutch index crawl/newindexes crawl/crawldb crawl/linkdb crawl/segments/*
bin/nutch dedup crawl/newindexes
bin/nutch merge crawl/index crawl/newindexes

then redeploying the war file. This made no difference though.
Running the HighFreqTerms only seems to list the most common terms and doesn't actually do anything to your index.

ii) Make a page containing the string 'zed' in the url (this is the only url with this string) come top in my search. Again, I followed the FAQ which points at nutch-site.xml in the war file and played around with different settings for the fields below. Despite redeploying the war file and clearing out my cache, I still got the same results in the explain option.

  <description> Used as a boost for url field in Lucene query.

  <description> Used as a boost for anchor field in Lucene query.

  <description> Used as a boost for title field in Lucene query.

  <description> Used as a boost for host field in Lucene query.

Make a mini you and download it into Windows Live Messenger