Apache Tika - Development

This forum is an archive for the mailing list tika-dev@lucene.apache.org (more options) Messages posted here will be sent to this mailing list.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
1 ... 605606607608609610611 ... 625
Topics (21860)
Replies Last Post Views
General question about patches by kkrugler
2
by kkrugler
[jira] Created: (TIKA-296) Automatically set the supertype for "+xml" mimetypes by JIRA jira@apache.org
4
by JIRA jira@apache.org
[jira] Created: (TIKA-297) The HtmlParser ignores <menu> tags, resulting in invalid XHTML by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-299) Update Geronimo dependency in tika-parsers pom.xml to 1.0.1 by JIRA jira@apache.org
1
by JIRA jira@apache.org
Error in Eclipse with ordering of libs by kkrugler
3
by kkrugler
[jira] Created: (TIKA-284) Upgrade to POI 3.5-FINAL by JIRA jira@apache.org
1
by JIRA jira@apache.org
Towards Tika 0.5 by Jukka Zitting
1
by Mattmann, Chris A (3...
[jira] Created: (TIKA-269) Ease of use -facade for Tika by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Resolved: (TIKA-61) Add namespaces to our metadata keys by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Created: (TIKA-281) Use repository.apache.org to deploy snapshots and releases by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-292) PDFBox is too verbose by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-291) Adobe InDesign suport by JIRA jira@apache.org
1
by JIRA jira@apache.org
Test failures from trunk by kkrugler
1
by Jukka Zitting
[jira] Created: (TIKA-289) Add magic byte patterns from file(1) by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Created: (TIKA-285) Update media type registry to the latest httpd mime type database by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (TIKA-286) HtmlParser calls characters() with post-body data before processing the terminating body element. by JIRA jira@apache.org
3
by JIRA jira@apache.org
Html parser questions by kkrugler
3
by kkrugler
Multiple documents per input stream by kkrugler
5
by Jukka Zitting
[jira] Created: (TIKA-280) Fix NOTICE files to match consensus from legal team by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-283) XWPFWordExtractorDecorator does not extract links in tables by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Resolved: (TIKA-158) Upgrade to Apache PDFBox by JIRA jira@apache.org
0
by JIRA jira@apache.org
Html parser questions by kkrugler
0
by kkrugler
Fwd: [ANNOUNCE] Apache PDFBox 0.8.0-incubating released by Jukka Zitting
0
by Jukka Zitting
Javadoc index not complete? by kkrugler
0
by kkrugler
[jira] Created: (TIKA-252) PackageParser's XHTML should contain metadata of subfiles by JIRA jira@apache.org
2
by JIRA jira@apache.org
rdf output by turnguard
2
by kkrugler
Trunk revision 813987 fails to build on Snow Leopard by Ross McDonald
3
by Ross McDonald
[jira] Created: (TIKA-276) Drop the StringUtils class by JIRA jira@apache.org
1
by JIRA jira@apache.org
Re: Board Report Due by Jukka Zitting
0
by Jukka Zitting
[jira] Created: (TIKA-272) Expose characters offsets information while parsing text-based inputs. by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-273) Content encoding in HtmlParser by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-274) CharsetDetector.setDeclaredEncoding has no effect by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (TIKA-193) PDFParser adds mime-type twice by JIRA jira@apache.org
7
by JIRA jira@apache.org
Passing context information to parsers by Jukka Zitting
1
by Michael Wechner
Supported media types per parser by Jukka Zitting
0
by Jukka Zitting
1 ... 605606607608609610611 ... 625