[jira] [Created] (TIKA-3044) add -C/--content cli option using WriteOutContentHandler

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (TIKA-3044) add -C/--content cli option using WriteOutContentHandler

Hudson (Jira)
Alexander Klimetschek created TIKA-3044:
-------------------------------------------

             Summary: add -C/--content cli option using WriteOutContentHandler
                 Key: TIKA-3044
                 URL: https://issues.apache.org/jira/browse/TIKA-3044
             Project: Tika
          Issue Type: New Feature
          Components: cli
            Reporter: Alexander Klimetschek


For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is currently no cli option that gives all text content. However, the Tika API has the WriteOutContentHandler which does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)