    [ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510720#comment-16510720 ]

ASF GitHub Bot commented on NUTCH-2012:

sebastian-nagel closed pull request #348: NUTCH-2012: output fix
URL: https://github.com/apache/nutch/pull/348

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/java/org/apache/nutch/parse/ParserChecker.java b/src/java/org/apache/nutch/parse/ParserChecker.java
index b0f71d47e..195933c07 100644
--- a/src/java/org/apache/nutch/parse/ParserChecker.java
+++ b/src/java/org/apache/nutch/parse/ParserChecker.java
@@ -271,13 +271,10 @@ protected int process(String url, StringBuilder output) throws Exception {
     for (Map.Entry<Text, Parse> entry : parseResult) {
       parse = entry.getValue();
-      LOG.info("---------\nUrl\n---------------\n");
-      System.out.print(entry.getKey());
-      LOG.info("\n---------\nParseData\n---------\n");
-      System.out.print(parse.getData().toString());
+      output.append(entry.getKey() + "\n");
+      output.append(parse.getData().toString() + "\n");
       if (dumpText) {
-        LOG.info("---------\nParseText\n---------\n");
-        System.out.print(parse.getText());
+        output.append(parse.getText());


> Merge parsechecker and indexchecker
> -----------------------------------
>                 Key: NUTCH-2012
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2012
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.10
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
> ParserChecker and IndexingFiltersChecker have evolved from simple tools to check parsers and parsefilters resp. indexing filters to powerful tools which emulate the crawling of a single URL/document:
> - check robots.txt (NUTCH-2002)
> - follow redirects (NUTCH-2004)
> Keeping both tools in sync takes extra work (cf. NUTCH-1757/NUTCH-2006, also NUTCH-2002, NUTCH-2004 are done only for parsechecker). It's time to merge them
> * either into one general debugging tool, keeping parsechecker and indexchecker as aliases
> * centralize common code in one utility class

