strange page rank

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

strange page rank

Lyndon Maydwell
Hi list.

I'm having trouble figuring out why certain pages are being ranked
much higher than others on my Nutch installation.

For example, not long ago, the department of computing's homepage was
ranked #1 for the query "computing department".
However, recently it has dropped in the rankings considerably down to #8.
This is happening for many pages where I would have expected the
rankings to be different, and in the past were ranked much higher.

---

Examining the linkdb, the homepage has far more inbound links than the
#1 ranked page:

> ./nutch readlinkdb ../../crawl/linkdb/ -url "<homepage>" | wc -l
67
> ./nutch readlinkdb ../../crawl/linkdb/ -url "<#1 page>" | wc -l
1

Yet is ranked far below the #1 page that has just one inbound link.

Below are the 'explain' outputs generated by the Nutch web-app for both pages.

Thanks for your help.

---

page

    * segment = 20080130221252
    * digest = 9bda7b5dd19bc5b270ac5040a3507af0
    * url = <homepage>
    * title = Department of Computing
    * tstamp = 20080129143145567
    * primaryType = text
    * subType = html
    * boost = 64.50891

score for query: computing department

    * 508.50793 = (MATCH) sum of:
          o 143.2611 = (MATCH) sum of:
                + 2.8138897 = (MATCH) weight(url:computing^4.0 in
12894), product of:
                      # 0.09858931 = queryWeight(url:computing^4.0), product of:
                            * 4.0 = boost
                            * 2.378461 = idf(docFreq=32807)
                            * 0.010362721 = queryNorm
                      # 28.54153 = (MATCH) fieldWeight(url:computing
in 12894), product of:
                            * 1.0 = tf(termFreq(url:computing)=1)
                            * 2.378461 = idf(docFreq=32807)
                            * 12.0 = fieldNorm(field=url, doc=12894)
                + 108.64339 = (MATCH) weight(anchor:computing^2.0 in
12894), product of:
                      # 0.18734181 =
queryWeight(anchor:computing^2.0), product of:
                            * 2.0 = boost
                            * 9.039219 = idf(docFreq=41)
                            * 0.010362721 = queryNorm
                      # 579.92065 = (MATCH)
fieldWeight(anchor:computing in 12894), product of:
                            * 4.582576 = tf(termFreq(anchor:computing)=21)
                            * 9.039219 = idf(docFreq=41)
                            * 14.0 = fieldNorm(field=anchor, doc=12894)
                + 1.066432 = (MATCH) weight(content:computing in
12894), product of:
                      # 0.047251105 = queryWeight(content:computing),
product of:
                            * 4.55972 = idf(docFreq=3703)
                            * 0.010362721 = queryNorm
                      # 22.569462 = (MATCH)
fieldWeight(content:computing in 12894), product of:
                            * 2.828427 = tf(termFreq(content:computing)=8)
                            * 4.55972 = idf(docFreq=3703)
                            * 1.75 = fieldNorm(field=content, doc=12894)
                + 26.982933 = (MATCH) weight(title:computing^1.5 in
12894), product of:
                      # 0.114485934 =
queryWeight(title:computing^1.5), product of:
                            * 1.5 = boost
                            * 7.3652425 = idf(docFreq=223)
                            * 0.010362721 = queryNorm
                      # 235.68776 = (MATCH)
fieldWeight(title:computing in 12894), product of:
                            * 1.0 = tf(termFreq(title:computing)=1)
                            * 7.3652425 = idf(docFreq=223)
                            * 32.0 = fieldNorm(field=title, doc=12894)
                + 3.7544508 = (MATCH) weight(host:computing^2.0 in
12894), product of:
                      # 0.049311716 = queryWeight(host:computing^2.0),
product of:
                            * 2.0 = boost
                            * 2.3792841 = idf(docFreq=32780)
                            * 0.010362721 = queryNorm
                      # 76.13709 = (MATCH) fieldWeight(host:computing
in 12894), product of:
                            * 1.0 = tf(termFreq(host:computing)=1)
                            * 2.3792841 = idf(docFreq=32780)
                            * 32.0 = fieldNorm(field=host, doc=12894)
          o 94.75222 = (MATCH) sum of:
                + 77.209114 = (MATCH) weight(anchor:department^2.0 in
12894), product of:
                      # 0.17477937 =
queryWeight(anchor:department^2.0), product of:
                            * 2.0 = boost
                            * 8.433083 = idf(docFreq=76)
                            * 0.010362721 = queryNorm
                      # 441.7519 = (MATCH)
fieldWeight(anchor:department in 12894), product of:
                            * 3.7416575 = tf(termFreq(anchor:department)=14)
                            * 8.433083 = idf(docFreq=76)
                            * 14.0 = fieldNorm(field=anchor, doc=12894)
                + 0.5697994 = (MATCH) weight(content:department in
12894), product of:
                      # 0.03453873 = queryWeight(content:department),
product of:
                            * 3.332979 = idf(docFreq=12630)
                            * 0.010362721 = queryNorm
                      # 16.497404 = (MATCH)
fieldWeight(content:department in 12894), product of:
                            * 2.828427 = tf(termFreq(content:department)=8)
                            * 3.332979 = idf(docFreq=12630)
                            * 1.75 = fieldNorm(field=content, doc=12894)
                + 16.973307 = (MATCH) weight(title:department^1.5 in
12894), product of:
                      # 0.09080103 =
queryWeight(title:department^1.5), product of:
                            * 1.5 = boost
                            * 5.841518 = idf(docFreq=1027)
                            * 0.010362721 = queryNorm
                      # 186.92857 = (MATCH)
fieldWeight(title:department in 12894), product of:
                            * 1.0 = tf(termFreq(title:department)=1)
                            * 5.841518 = idf(docFreq=1027)
                            * 32.0 = fieldNorm(field=title, doc=12894)
          o 225.8337 = weight(anchor:"computing department"~4^2.0 in
12894), product of:
                + 0.36212116 = queryWeight(anchor:"computing
department"~4^2.0), product of:
                      # 2.0 = boost
                      # 17.472301 = idf(anchor: computing=41 department=76)
                      # 0.010362721 = queryNorm
                + 623.64124 = fieldWeight(anchor:"computing
department" in 12894), product of:
                      # 2.5495098 = tf(phraseFreq=6.5)
                      # 17.472301 = idf(anchor: computing=41 department=76)
                      # 14.0 = fieldNorm(field=anchor, doc=12894)
          o 1.2821399 = weight(content:"computing
department"~2147483647 in 12894), product of:
                + 0.081789844 = queryWeight(content:"computing
department"~2147483647), product of:
                      # 7.8926992 = idf(content: computing=3703
department=12630)
                      # 0.010362721 = queryNorm
                + 15.676028 = fieldWeight(content:"computing
department" in 12894), product of:
                      # 1.1349387 = tf(phraseFreq=1.2880858)
                      # 7.8926992 = idf(content: computing=3703
department=12630)
                      # 1.75 = fieldNorm(field=content, doc=12894)
          o 43.37881 = weight(title:"computing
department"~2147483647^1.5 in 12894), product of:
                + 0.20528696 = queryWeight(title:"computing
department"~2147483647^1.5), product of:
                      # 1.5 = boost
                      # 13.20676 = idf(title: computing=223 department=1027)
                      # 0.010362721 = queryNorm
                + 211.30817 = fieldWeight(title:"computing department"
in 12894), product of:
                      # 0.5 = tf(phraseFreq=0.25)
                      # 13.20676 = idf(title: computing=223 department=1027)
                      # 32.0 = fieldNorm(field=title, doc=12894)

page

    * segment = 20080130221252
    * digest = 3d668919624a84bf477d3c1be6d117f6
    * url = <#1 page>
    * title = ::International Federation for Information Processing::
    * tstamp = 20080130122101878
    * lastModified = 1190066937000
    * contentLength = 20958
    * primaryType = text
    * subType = html
    * boost = 214625.97

score for query: computing department

    * 6009.6284 = (MATCH) sum of:
          o 2702.0647 = (MATCH) sum of:
                + 2702.0647 = (MATCH) weight(content:computing in
74002), product of:
                      # 0.047251105 = queryWeight(content:computing),
product of:
                            * 4.55972 = idf(docFreq=3703)
                            * 0.010362721 = queryNorm
                      # 57185.22 = (MATCH)
fieldWeight(content:computing in 74002), product of:
                            * 2.4494898 = tf(termFreq(content:computing)=6)
                            * 4.55972 = idf(docFreq=3703)
                            * 5120.0 = fieldNorm(field=content, doc=74002)
          o 1317.9348 = (MATCH) sum of:
                + 1317.9348 = (MATCH) weight(content:department in
74002), product of:
                      # 0.03453873 = queryWeight(content:department),
product of:
                            * 3.332979 = idf(docFreq=12630)
                            * 0.010362721 = queryNorm
                      # 38158.17 = (MATCH)
fieldWeight(content:department in 74002), product of:
                            * 2.236068 = tf(termFreq(content:department)=5)
                            * 3.332979 = idf(docFreq=12630)
                            * 5120.0 = fieldNorm(field=content, doc=74002)
          o 1989.629 = weight(content:"computing
department"~2147483647 in 74002), product of:
                + 0.081789844 = queryWeight(content:"computing
department"~2147483647), product of:
                      # 7.8926992 = idf(content: computing=3703
department=12630)
                      # 0.010362721 = queryNorm
                + 24326.113 = fieldWeight(content:"computing
department" in 74002), product of:
                      # 0.6019733 = tf(phraseFreq=0.36237186)
                      # 7.8926992 = idf(content: computing=3703
department=12630)
                      # 5120.0 = fieldNorm(field=content, doc=74002)
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Lyndon Maydwell
Sorry to bump this, but I just noticed that the scores for my recent
crawler are very high.

-- old crawler (sensible results) --

min score:      0.0
avg score:      0.505
max score:      7736.152

-- new crawler (poor results) --

min score:      0.0
avg score:      9.4379096E7
max score:      1.52620289E12

Has anyone experienced something similar to this?
Is there any reason why this might have happened?

Will regenerating my crawldb from the segments using updatedb fix this problem?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Dennis Kubes-2
Most likely this is due to internal links.  A fix going forward it to
set db.score.link.internal to 0.0 or some very low value.  I don't think
there is a way to currently reset the scores in the crawldb.  This is
something we are looking into building.

Dennis

Lyndon Maydwell wrote:

> Sorry to bump this, but I just noticed that the scores for my recent
> crawler are very high.
>
> -- old crawler (sensible results) --
>
> min score:      0.0
> avg score:      0.505
> max score:      7736.152
>
> -- new crawler (poor results) --
>
> min score:      0.0
> avg score:      9.4379096E7
> max score:      1.52620289E12
>
> Has anyone experienced something similar to this?
> Is there any reason why this might have happened?
>
> Will regenerating my crawldb from the segments using updatedb fix this problem?
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Lyndon Maydwell
Thanks for your help Dennis.

I'm not sure that the problem is coming from the link.internal boost.
Some pages with very high scores have relatively few inbound links,
yet pages that seem to match more criteria for boosts, and have a far
greater number of inbound links receive a much lower score.

Also, I think I have to weight internal links fairly highly, as I am
primarily crawling one domain and its sub-domains. Currently I've
given db.score.link.internal a value of 1.0

I may restart the crawl from scratch to see if this helps, as I have
changed a few settings between recrawls.
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Lyndon Maydwell
Hi list.

I've reached a dead end with my page rankings.

I dumped my crawldb and extracted the urls which I used to recrawl
from scratch. The score problem now seems to have resolved itself,
with the stats:

min score:      0.01
avg score:      1.456
max score:      1769.588

However, my rankings still seem very strange indeed.

Homepage ->

bin> ./nutch readdb ../../crawl/crawldb/ -url "http://computing.edu.au/"
Score: 8.384491

bin> ./nutch readlinkdb ../../crawl/linkdb/ -url "http://computing.edu.au/"
# 21 lines

Number 1 ranked page for query "computing" ->

bin> ./nutch readdb ../../crawl/crawldb/ -url \
 "http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html"
Score: 603.21295

bin> ./nutch readlinkdb ../../crawl/linkdb/ -url \
 "http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html"
 - no link information.

I was still using db.score.link.internal at 1.0 at this point, but
this shouldn't be a problem if the linkdb shows what should imply the
opposite effect right? Or am I missing something here?

Thanks for your help.


P.S.

Once again, the explain output from the nutch web-app is appended below:

page

    * segment = 20080209014116
    * digest = 0bb27aa447e4abacd9261f78d686c2ab
    * url = http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html
    * title = Object (Java 2 Platform SE 5.0)
    * tstamp = 20080208113345346
    * lastModified = 1092234170000
    * contentLength = 41711
    * primaryType = text
    * subType = html
    * boost = 603.21295

score for query: computing

    * 139.65408 = (MATCH) sum of:
          o 46.52672 = (MATCH) weight(url:computing^8.0 in 34170), product of:
                + 0.5596049 = queryWeight(url:computing^8.0), product of:
                      # 8.0 = boost
                      # 2.0785522 = idf(docFreq=32728)
                      # 0.033653524 = queryNorm
                + 83.14209 = (MATCH) fieldWeight(url:computing in
34170), product of:
                      # 1.0 = tf(termFreq(url:computing)=1)
                      # 2.0785522 = idf(docFreq=32728)
                      # 40.0 = fieldNorm(field=url, doc=34170)
          o 93.12736 = (MATCH) weight(host:computing^2.0 in 34170), product of:
                + 0.13995677 = queryWeight(host:computing^2.0), product of:
                      # 2.0 = boost
                      # 2.0793777 = idf(docFreq=32701)
                      # 0.033653524 = queryNorm
                + 665.4009 = (MATCH) fieldWeight(host:computing in
34170), product of:
                      # 1.0 = tf(termFreq(host:computing)=1)
                      # 2.0793777 = idf(docFreq=32701)
                      # 320.0 = fieldNorm(field=host, doc=34170)

page

    * segment = 20080209014116
    * digest = 9bda7b5dd19bc5b270ac5040a3507af0
    * url = http://computing.edu.au/
    * title = Department of Computing
    * tstamp = 20080208121237551
    * primaryType = text
    * subType = html
    * boost = 8.384491

score for query: computing

    * 56.424984 = (MATCH) sum of:
          o 1.7447519 = (MATCH) weight(url:computing^8.0 in 11825), product of:
                + 0.5596049 = queryWeight(url:computing^8.0), product of:
                      # 8.0 = boost
                      # 2.0785522 = idf(docFreq=32728)
                      # 0.033653524 = queryNorm
                + 3.1178284 = (MATCH) fieldWeight(url:computing in
11825), product of:
                      # 1.0 = tf(termFreq(url:computing)=1)
                      # 2.0785522 = idf(docFreq=32728)
                      # 1.5 = fieldNorm(field=url, doc=11825)
          o 41.14292 = (MATCH) weight(anchor:computing^2.0 in 11825),
product of:
                + 0.6996653 = queryWeight(anchor:computing^2.0), product of:
                      # 2.0 = boost
                      # 10.395127 = idf(docFreq=7)
                      # 0.033653524 = queryNorm
                + 58.80372 = (MATCH) fieldWeight(anchor:computing in
11825), product of:
                      # 2.828427 = tf(termFreq(anchor:computing)=8)
                      # 10.395127 = idf(docFreq=7)
                      # 2.0 = fieldNorm(field=anchor, doc=11825)
          o 0.6168165 = (MATCH) weight(content:computing in 11825), product of:
                + 0.17133684 = queryWeight(content:computing), product of:
                      # 5.091201 = idf(docFreq=1608)
                      # 0.033653524 = queryNorm
                + 3.6000226 = (MATCH) fieldWeight(content:computing in
11825), product of:
                      # 2.828427 = tf(termFreq(content:computing)=8)
                      # 5.091201 = idf(docFreq=1608)
                      # 0.25 = fieldNorm(field=content, doc=11825)
          o 11.7564 = (MATCH) weight(title:computing^1.5 in 11825), product of:
                + 0.38518387 = queryWeight(title:computing^1.5), product of:
                      # 1.5 = boost
                      # 7.630382 = idf(docFreq=126)
                      # 0.033653524 = queryNorm
                + 30.521528 = (MATCH) fieldWeight(title:computing in
11825), product of:
                      # 1.0 = tf(termFreq(title:computing)=1)
                      # 7.630382 = idf(docFreq=126)
                      # 4.0 = fieldNorm(field=title, doc=11825)
          o 1.164092 = (MATCH) weight(host:computing^2.0 in 11825), product of:
                + 0.13995677 = queryWeight(host:computing^2.0), product of:
                      # 2.0 = boost
                      # 2.0793777 = idf(docFreq=32701)
                      # 0.033653524 = queryNorm
                + 8.317511 = (MATCH) fieldWeight(host:computing in
11825), product of:
                      # 1.0 = tf(termFreq(host:computing)=1)
                      # 2.0793777 = idf(docFreq=32701)
                      # 4.0 = fieldNorm(field=host, doc=11825)
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Dennis Kubes-2
The score is so high because you have a lot of internal pages pointing
to the Object page. But by default db.ignore.internal.links is set to
true.  What this means is even though internal links we be counted for
score (if db.score.link.internal > 0) they will not be stored in the
linkdb by default, hence no link information when reading.

Dennis

Lyndon Maydwell wrote:

> Hi list.
>
> I've reached a dead end with my page rankings.
>
> I dumped my crawldb and extracted the urls which I used to recrawl
> from scratch. The score problem now seems to have resolved itself,
> with the stats:
>
> min score:      0.01
> avg score:      1.456
> max score:      1769.588
>
> However, my rankings still seem very strange indeed.
>
> Homepage ->
>
> bin> ./nutch readdb ../../crawl/crawldb/ -url "http://computing.edu.au/"
> Score: 8.384491
>
> bin> ./nutch readlinkdb ../../crawl/linkdb/ -url "http://computing.edu.au/"
> # 21 lines
>
> Number 1 ranked page for query "computing" ->
>
> bin> ./nutch readdb ../../crawl/crawldb/ -url \
>  "http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html"
> Score: 603.21295
>
> bin> ./nutch readlinkdb ../../crawl/linkdb/ -url \
>  "http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html"
>  - no link information.
>
> I was still using db.score.link.internal at 1.0 at this point, but
> this shouldn't be a problem if the linkdb shows what should imply the
> opposite effect right? Or am I missing something here?
>
> Thanks for your help.
>
>
> P.S.
>
> Once again, the explain output from the nutch web-app is appended below:
>
> page
>
>     * segment = 20080209014116
>     * digest = 0bb27aa447e4abacd9261f78d686c2ab
>     * url = http://computing.edu.au/documents/java_docs/jdk1.5/api/java/lang/Object.html
>     * title = Object (Java 2 Platform SE 5.0)
>     * tstamp = 20080208113345346
>     * lastModified = 1092234170000
>     * contentLength = 41711
>     * primaryType = text
>     * subType = html
>     * boost = 603.21295
>
> score for query: computing
>
>     * 139.65408 = (MATCH) sum of:
>           o 46.52672 = (MATCH) weight(url:computing^8.0 in 34170), product of:
>                 + 0.5596049 = queryWeight(url:computing^8.0), product of:
>                       # 8.0 = boost
>                       # 2.0785522 = idf(docFreq=32728)
>                       # 0.033653524 = queryNorm
>                 + 83.14209 = (MATCH) fieldWeight(url:computing in
> 34170), product of:
>                       # 1.0 = tf(termFreq(url:computing)=1)
>                       # 2.0785522 = idf(docFreq=32728)
>                       # 40.0 = fieldNorm(field=url, doc=34170)
>           o 93.12736 = (MATCH) weight(host:computing^2.0 in 34170), product of:
>                 + 0.13995677 = queryWeight(host:computing^2.0), product of:
>                       # 2.0 = boost
>                       # 2.0793777 = idf(docFreq=32701)
>                       # 0.033653524 = queryNorm
>                 + 665.4009 = (MATCH) fieldWeight(host:computing in
> 34170), product of:
>                       # 1.0 = tf(termFreq(host:computing)=1)
>                       # 2.0793777 = idf(docFreq=32701)
>                       # 320.0 = fieldNorm(field=host, doc=34170)
>
> page
>
>     * segment = 20080209014116
>     * digest = 9bda7b5dd19bc5b270ac5040a3507af0
>     * url = http://computing.edu.au/
>     * title = Department of Computing
>     * tstamp = 20080208121237551
>     * primaryType = text
>     * subType = html
>     * boost = 8.384491
>
> score for query: computing
>
>     * 56.424984 = (MATCH) sum of:
>           o 1.7447519 = (MATCH) weight(url:computing^8.0 in 11825), product of:
>                 + 0.5596049 = queryWeight(url:computing^8.0), product of:
>                       # 8.0 = boost
>                       # 2.0785522 = idf(docFreq=32728)
>                       # 0.033653524 = queryNorm
>                 + 3.1178284 = (MATCH) fieldWeight(url:computing in
> 11825), product of:
>                       # 1.0 = tf(termFreq(url:computing)=1)
>                       # 2.0785522 = idf(docFreq=32728)
>                       # 1.5 = fieldNorm(field=url, doc=11825)
>           o 41.14292 = (MATCH) weight(anchor:computing^2.0 in 11825),
> product of:
>                 + 0.6996653 = queryWeight(anchor:computing^2.0), product of:
>                       # 2.0 = boost
>                       # 10.395127 = idf(docFreq=7)
>                       # 0.033653524 = queryNorm
>                 + 58.80372 = (MATCH) fieldWeight(anchor:computing in
> 11825), product of:
>                       # 2.828427 = tf(termFreq(anchor:computing)=8)
>                       # 10.395127 = idf(docFreq=7)
>                       # 2.0 = fieldNorm(field=anchor, doc=11825)
>           o 0.6168165 = (MATCH) weight(content:computing in 11825), product of:
>                 + 0.17133684 = queryWeight(content:computing), product of:
>                       # 5.091201 = idf(docFreq=1608)
>                       # 0.033653524 = queryNorm
>                 + 3.6000226 = (MATCH) fieldWeight(content:computing in
> 11825), product of:
>                       # 2.828427 = tf(termFreq(content:computing)=8)
>                       # 5.091201 = idf(docFreq=1608)
>                       # 0.25 = fieldNorm(field=content, doc=11825)
>           o 11.7564 = (MATCH) weight(title:computing^1.5 in 11825), product of:
>                 + 0.38518387 = queryWeight(title:computing^1.5), product of:
>                       # 1.5 = boost
>                       # 7.630382 = idf(docFreq=126)
>                       # 0.033653524 = queryNorm
>                 + 30.521528 = (MATCH) fieldWeight(title:computing in
> 11825), product of:
>                       # 1.0 = tf(termFreq(title:computing)=1)
>                       # 7.630382 = idf(docFreq=126)
>                       # 4.0 = fieldNorm(field=title, doc=11825)
>           o 1.164092 = (MATCH) weight(host:computing^2.0 in 11825), product of:
>                 + 0.13995677 = queryWeight(host:computing^2.0), product of:
>                       # 2.0 = boost
>                       # 2.0793777 = idf(docFreq=32701)
>                       # 0.033653524 = queryNorm
>                 + 8.317511 = (MATCH) fieldWeight(host:computing in
> 11825), product of:
>                       # 1.0 = tf(termFreq(host:computing)=1)
>                       # 2.0793777 = idf(docFreq=32701)
>                       # 4.0 = fieldNorm(field=host, doc=11825)
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Lyndon Maydwell
I'll give it a shot with a very low internal boost.

Thanks a lot for your assistance.
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Dennis Kubes-2
Or set db.ignore.internal.links if you want to store and see all the
internal links in the linkdb.  If you do this you would also want to
change db.max.inlinks to be a much higher number than it's current 10000.

Dennis

Lyndon Maydwell wrote:
> I'll give it a shot with a very low internal boost.
>
> Thanks a lot for your assistance.
Reply | Threaded
Open this post in threaded view
|

Re: strange page rank

Lyndon Maydwell
Thanks guys. Problem solved.

It was the ignore property that was really throwing me, as the dumping
urls from the linkdb wasn't showing them to me.

Setting the internal link boost to 0.01 seems to have solved my
problem completely.