[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351205#comment-16351205 ]

ASF GitHub Bot commented on NUTCH-2202:

lewismc commented on issue #97: NUTCH-2202 Integration of Anthelion (Focused Crawling Module) into Nutch
URL: https://github.com/apache/nutch/pull/97#issuecomment-362768898
   Hi @HansBrende I appreciate it, I was unable to get to this for a while. I've merged and pushed your changes. I think we need further review before we consider merging. Also, I have a feeling that the weka library has a non-compliant license. We have some investigation to do.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[hidden email]

> Integration of Anthelion (Focused Crawling Module) into Nutch
> -------------------------------------------------------------
>                 Key: NUTCH-2202
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2202
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser, scoring
>            Reporter: Robert Meusel
>            Assignee: Lewis John McGibbney
>            Priority: Major
>              Labels: any23, online_learning
> We have recently released anthelion, which is a focused crawler plugin for structured data which can be extracted with any23. (https://github.com/yahoo/anthelion) As proposed by Lewis (Lewis John McGibbney) we think the integration of the parser (any23) and the scoring function based on the online learner could be a good improvement for nutch.

This message was sent by Atlassian JIRA