[jira] Created: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
Scoring filter should distribute score to all outlinks at once
--------------------------------------------------------------

                 Key: NUTCH-468
                 URL: https://issues.apache.org/jira/browse/NUTCH-468
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Doğacan Güney
            Priority: Minor
             Fix For: 1.0.0
         Attachments: scoring.patch

Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:

1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney updated NUTCH-468:
--------------------------------

    Attachment: scoring.patch

Patch for the issue. It doesn't change the way scoring-opic works.

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491051 ]

Nicolás Lichtmaier commented on NUTCH-468:
------------------------------------------

This patch would be useful to me.

Just one very minor thing:

Here:

-  public CrawlDatum distributeScoreToOutlink(Text fromUrl, Text toUrl, ParseData parseData, CrawlDatum target, CrawlDatum adjust, int allCount, int validCount) throws ScoringFilterException {
+  public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, List<Entry<Text, CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException {

Why don't you just use Collection instead of List? It's enough for iterating and could, in the future, be passed a Set of entries (as Map provides).


> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney updated NUTCH-468:
--------------------------------

    Attachment: scoring-v2.patch

That makes sense, patch with the suggested change.

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring-v2.patch, scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492386 ]

Andrzej Bialecki  commented on NUTCH-468:
-----------------------------------------

+1. I'm writing a scoring plugin now where it's impossible to correctly create the adjust value without this change.

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring-v2.patch, scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507366 ]

Doğacan Güney commented on NUTCH-468:
-------------------------------------

Latest patch still applies to current trunk. If no one has objections I am going to commit this in a few days.

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring-v2.patch, scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney resolved NUTCH-468.
---------------------------------

    Resolution: Fixed
      Assignee: Doğacan Güney

Committed in rev. 550188 with minor modifications.

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring-v2.patch, scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-468) Scoring filter should distribute score to all outlinks at once

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507784 ]

Hudson commented on NUTCH-468:
------------------------------

Integrated in Nutch-Nightly #128 (See [http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/128/])

> Scoring filter should distribute score to all outlinks at once
> --------------------------------------------------------------
>
>                 Key: NUTCH-468
>                 URL: https://issues.apache.org/jira/browse/NUTCH-468
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: scoring-v2.patch, scoring.patch
>
>
> Currently ScoringFilter.distributeScoreToOutlink, as its name implies, takes only a single outlink and works on that. I would suggest that we change it to distributeScoreToOutlink_s_ so that it would take all the outlinks of a page at once. This has several advantages:
> 1) A ScoringFilter plugin returns a single adjust datum to set its score instead of returning several.
> 2) A ScoringFilter plugin can change the score of the original page (via adjust datum) even if there are no outlinks. This is useful if you have a ScoringFilter plugin that, say, scores pages based on content instead of outlinks.
> 3) Since the ScoringFilter plugin recieves all outlinks at once, it can make better decisions on how to distribute the score. For example, right now it is not possible to create a plugin that always distributes exactly a page's 'cash' to outlinks(that is, if a page has score 5, it will always distribute exactly 5 points to its outlinks no matter what the internal/external factors are) if internal / external score factors are not 1.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.