[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161061#comment-16161061 ]

ASF GitHub Bot commented on NUTCH-2397:
---------------------------------------

sebastian-nagel closed pull request #198: NUTCH-2397: Parser to add paragraph line breaks
URL: https://github.com/apache/nutch/pull/198
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Parser to add paragraph line breaks
> -----------------------------------
>
>                 Key: NUTCH-2397
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2397
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.3.1, 1.13
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 2.4, 1.14
>
>
> (initially reported with patch/pull-request by Vipul Behl, see [#190|https://github.com/apache/nutch/pull/190])
> The parser (parse-tika and parse-html) could be improved to add line breaks between paragraphs, instead of writing the whole document into a single line.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)