[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356509#comment-16356509 ]

ASF GitHub Bot commented on NUTCH-1129:
---------------------------------------

ferrerod commented on issue #205: WIP: NUTCH-1129 microdata for Nutch 1.x
URL: https://github.com/apache/nutch/pull/205#issuecomment-364007152
 
 
   On a Mac with jdk 8 installed, I ran into failure on the javadoc task complaining about the java version. Upon deeper inspection I determined the failure condition was  tripping up on ant.java.version equals 1.6 - running Ant -v and it said my are in the JAVA_HOME (jdk 8) is 1.6! Super strange...
   
   I removed the ant.java.version checks in java doc task and  reran...
   
   ant zip-bin with java 8  finished successfully!!  However, the reason I'm posting here is, I noticed 19 errors and 106 warnings in the java doc task. Here is the first few errors it encountered:
   
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:30: error: package org.apache.any23 does not exist
     [javadoc] import org.apache.any23.Any23;
     [javadoc]                        ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:31: error: package org.apache.any23.extractor does not exist
     [javadoc] import org.apache.any23.extractor.ExtractionException;
     [javadoc]                                  ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:32: error: package org.apache.any23.writer does not exist
     [javadoc] import org.apache.any23.writer.BenchmarkTripleHandler;
     [javadoc]                               ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:33: error: package org.apache.any23.writer does not exist
     [javadoc] import org.apache.any23.writer.NTriplesWriter;
     [javadoc]                               ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:34: error: package org.apache.any23.writer does not exist
     [javadoc] import org.apache.any23.writer.TripleHandler;
     [javadoc]                               ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:35: error: package org.apache.any23.writer does not exist
     [javadoc] import org.apache.any23.writer.TripleHandlerException;
     [javadoc]                               ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:43: error: package org.ccil.cowan.tagsoup does not exist
     [javadoc] import org.ccil.cowan.tagsoup.XMLWriter;
     [javadoc]                              ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:44: error: package org.ccil.cowan.tagsoup.jaxp does not exist
     [javadoc] import org.ccil.cowan.tagsoup.jaxp.SAXParserImpl;
     [javadoc]                                   ^
     [javadoc] /nutch/src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java:87: error: cannot find symbol
     [javadoc]     Any23Parser(String url, String htmlContent, String contentType, String... extractorNames) throws TripleHandlerException {
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Any23 Nutch plugin
> ------------------
>
>                 Key: NUTCH-1129
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1129
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.15
>
>         Attachments: NUTCH-1129.patch
>
>
> This plugin should build on the Any23 library to provide us with a plugin which extracts RDF data from HTTP and file resources. Although as of writing Any23 not part of the ASF, the project is working towards integration into the Apache Incubator. Once the project proves its value, this would be an excellent addition to the Nutch 1.X codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)