[jira] Created: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks

Sebastian Nagel (Jira)
automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
---------------------------------------------------------------------------------------

                 Key: NUTCH-556
                 URL: https://issues.apache.org/jira/browse/NUTCH-556
             Project: Nutch
          Issue Type: New Feature
          Components: fetcher
            Reporter: King Kong


Usually, the spider must could  find the new urls  in time.

but  the score of url can not reflect it Adequately.

Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

King Kong updated NUTCH-556:
----------------------------

    Description:
The spider must could  find the new urls  in time.  and the new urls usually are included in some url like index page,list page.

but  the score of url can not reflect it Adequately.

Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.

  was:
Usually, the spider must could  find the new urls  in time.

but  the score of url can not reflect it Adequately.

Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.


> automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
> ---------------------------------------------------------------------------------------
>
>                 Key: NUTCH-556
>                 URL: https://issues.apache.org/jira/browse/NUTCH-556
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: King Kong
>
> The spider must could  find the new urls  in time.  and the new urls usually are included in some url like index page,list page.
> but  the score of url can not reflect it Adequately.
> Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578960#action_12578960 ]

Andrzej Bialecki  commented on NUTCH-556:
-----------------------------------------

Unless I'm missing something, this can be implemented as a custom FetchSchedule. If there are no objections I'd like to close this issue with Won't Fix.

> automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
> ---------------------------------------------------------------------------------------
>
>                 Key: NUTCH-556
>                 URL: https://issues.apache.org/jira/browse/NUTCH-556
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: King Kong
>
> The spider must could  find the new urls  in time.  and the new urls usually are included in some url like index page,list page.
> but  the score of url can not reflect it Adequately.
> Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  closed NUTCH-556.
-----------------------------------

    Resolution: Won't Fix

> automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
> ---------------------------------------------------------------------------------------
>
>                 Key: NUTCH-556
>                 URL: https://issues.apache.org/jira/browse/NUTCH-556
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: King Kong
>
> The spider must could  find the new urls  in time.  and the new urls usually are included in some url like index page,list page.
> but  the score of url can not reflect it Adequately.
> Could we adjust the CrawlDatum.fetchInterval according to the number of newly outlinks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.