[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538264#comment-16538264 ]

ASF GitHub Bot commented on NUTCH-1541:
---------------------------------------

sebastian-nagel commented on issue #294: NUTCH-1541 Indexer plugin to write CSV
URL: https://github.com/apache/nutch/pull/294#issuecomment-403747396
 
 
   +1 Please go ahead and merge. Thanks, @r0ann3l!
   - unit tests pass
   - successfully indexed into CSV using default configuration:
   ```
   % bin/nutch index -Dplugin.includes='indexer-csv|index-(basic|anchor|more)' \
        crawl/crawldb/ -dir crawl/segments/ -noCommit -deleteGone
   Indexer: number of documents indexed, deleted, or skipped:
   Indexer:      4  deleted (gone)
   Indexer:     35  indexed (add/update)
   
   % head -2 csvindexwriter/nutch.csv
   id,title,content
   http://nutch.apache.org/,Apache Nutchâ„¢ -,"Apache Nutchâ„¢ -
   ```
   - I had to remove or change exchange.xml to avoid that the Exchange component still tries to route documents to indexer_solr_1, see NUTCH-2617

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Indexer plugin to write CSV
> ---------------------------
>
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>         Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write configurable fields into a CSV file - for further analysis or just for export.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)