Apache Tika - Development

This forum is an archive for the mailing list tika-dev@lucene.apache.org (more options) Messages posted here will be sent to this mailing list.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
1 ... 596597598599600601602 ... 619
Topics (21634)
Replies Last Post Views
[jira] Created: (TIKA-349) HtmlParser's http-equiv code needs to be more flexible by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Resolved: (TIKA-125) Pass Locale information to parsers by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Created: (TIKA-342) Improve OSGi bundling by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-328) Add parser for .flv videos by JIRA jira@apache.org
5
by JIRA jira@apache.org
[jira] Created: (TIKA-321) Optimize type detection speed by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-339) HtmlParser & TXTParser should not use language returned by CharsetDetector if language hint has been provided by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] Created: (TIKA-343) some parsers produces glued words by JIRA jira@apache.org
1
by JIRA jira@apache.org
HtmlMapper by Jukka Zitting
0
by Jukka Zitting
[jira] Created: (TIKA-347) Make HtmlParser customizable through ParseContext by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-345) Add application/vnd.wap.xhtml+xml to list of mimetypes handled by HtmlParser by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-346) ZipParser throws "invalid compression method" error for some archives by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-338) Trying to use -encoding parameter alwyas results in an exception by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] Created: (TIKA-344) Charset hint in metadata by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-341) Use charset in CONTENT_TYPE metadata when detecting the character encoding by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] Created: (TIKA-335) TXTParser use of CharsetDetector has several bugs by JIRA jira@apache.org
6
by JIRA jira@apache.org
[jira] Created: (TIKA-332) Use http-equiv meta tag charset info when processing HTML documents by JIRA jira@apache.org
5
by JIRA jira@apache.org
Better Ohloh history for Tika by Jukka Zitting
2
by Jukka Zitting
HTML mime-types by kkrugler
1
by Jukka Zitting
Charset detection by Antoni Mylka-2
5
by Christiaan Fluit-2
[jira] Created: (TIKA-336) More issues with RDF mime detection by JIRA jira@apache.org
2
by JIRA jira@apache.org
source repository on Tika page by Julien Nioche-4
2
by kkrugler
New Tika committer by Jukka Zitting
2
by kkrugler
What to export from the tika-bundle ? by Felix Meschberger-2
0
by Felix Meschberger-2
[jira] Created: (TIKA-340) Provide full Tika bundle by JIRA jira@apache.org
15
by JIRA jira@apache.org
Fwd: a 'lite' version of ooxml-schemas jar by Jukka Zitting
0
by Jukka Zitting
[jira] Created: (TIKA-337) SWF parser by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] Commented: (TIKA-147) Add Flash parser by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Created: (TIKA-329) secure-processing not supported by some JAXP implementations (2) by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-334) HtmlParser should use CharsetDetector whenever no charset is specified via meta http-equiv tag by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Commented: (TIKA-147) Add Flash parser by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Created: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode by JIRA jira@apache.org
21
by JIRA jira@apache.org
[jira] Created: (TIKA-309) Mime type application/rdf+xml not correctly detected by JIRA jira@apache.org
14
by JIRA jira@apache.org
Missing href attribute handling by kkrugler
0
by kkrugler
[jira] Created: (TIKA-333) Improve accuracy of charset detection for HTML pages by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-331) Windings font recognition in Tika parsing + spacing issue by JIRA jira@apache.org
4
by JIRA jira@apache.org
1 ... 596597598599600601602 ... 619