[jira] [Commented] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString()

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (NUTCH-2788) ParseData: improve presentation of Metadata in method toString()

Sergey Smolyakov (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129461#comment-17129461 ]

ASF GitHub Bot commented on NUTCH-2788:

sebastian-nagel opened a new pull request #529:
URL: https://github.com/apache/nutch/pull/529

   - switch to multi-line presentation of Metadata in ParseData::toString
   - default implementation of Metadata::toString is still single-line
   - replace StringBuffer by StringBuilder in modified methods
   Parsechecker will now show metadata as follows:
   $> bin/nutch parsechecker -Dplugin.includes='parse-(tika|metatags)|protocol-okhttp' http://localhost/
   fetching: http://localhost/
   Title: Apache2 Ubuntu Default Page: It works
   Outlinks: 2
     outlink: toUrl: http://localhost/icons/ubuntu-logo.png anchor: Ubuntu Logo
     outlink: toUrl: http://localhost/manual anchor: manual
   Content Metadata:
     Accept-Ranges = bytes
     Keep-Alive = timeout=5, max=100
     nutch.fetch.time = 1591696071739
     Server = Apache/2.4.41 (Ubuntu)
     ETag = "2aa6-59647cb960db3-gzip"
     Connection = Keep-Alive
     Vary = Accept-Encoding
     Last-Modified = Fri, 01 Nov 2019 12:06:26 GMT
     Date = Tue, 09 Jun 2020 09:47:51 GMT
     Content-Type = text/html
   Parse Metadata:
     dc:title = Apache2 Ubuntu Default Page: It works
     Content-Encoding = UTF-8
     Content-Type-Hint = text/html; charset=UTF-8
     Content-Type = application/xhtml+xml; charset=UTF-8

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

> ParseData: improve presentation of Metadata in method toString()
> ----------------------------------------------------------------
>                 Key: NUTCH-2788
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2788
>             Project: Nutch
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.18
> See NUTCH-2567:
> bq. I would also suggest making the output of Metadata::toString more readable(for instance by adding a newline before each new metadata value). It would have made this bug way easier to spot inside the output of the parsechecker.

This message was sent by Atlassian Jira