Commented: (NUTCH-247) robot parser to restrict.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Commented: (NUTCH-247) robot parser to restrict.

Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473295 ]

Dennis Kubes commented on NUTCH-247:
------------------------------------

I think the idea here is to NOT allow people to run fetchers for which they haven't configured an agent name and email, etc.  There may be a better way to do this then simply logging severe and then stopping.  I think it would be best to provide some sort of feedback mechanism to the user either via the command line or an explicit exception that tells the user to configure the agent name and email in their nutch-*.xml file.  If this is the direction that we want to go, I can come up with a patch for this.

> robot parser to restrict.
> -------------------------
>
>                 Key: NUTCH-247
>                 URL: https://issues.apache.org/jira/browse/NUTCH-247
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Stefan Groschupf
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> If the agent name and the robots agents are not proper configure the Robot rule parser uses LOG.severe to log the problem but solve it also.
> Later on the fetcher thread checks for severe errors and stop if there is one.
> RobotRulesParser:
> if (agents.size() == 0) {
>       agents.add(agentName);
>       LOG.severe("No agents listed in 'http.robots.agents' property!");
>     } else if (!((String)agents.get(0)).equalsIgnoreCase(agentName)) {
>       agents.add(0, agentName);
>       LOG.severe("Agent we advertise (" + agentName
>                  + ") not listed first in 'http.robots.agents' property!");
>     }
> Fetcher.FetcherThread:
>  if (LogFormatter.hasLoggedSevere())     // something bad happened
>             break;  
> I suggest to use warn or something similar instead of severe to log this problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.