Apache Tika - Development

This forum is an archive for the mailing list tika-dev@lucene.apache.org (more options) Messages posted here will be sent to this mailing list.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
1 ... 500501502503504505506 ... 555
Topics (19397)
Replies Last Post Views
[VOTE] Apache Tika 1.0 release rc #1 by Mattmann, Chris A (3...
8
by David Meikle
[jira] [Commented] (TIKA-529) IBM420 charset detection's isLamAlef is allocation-happy by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Created] (TIKA-767) Enable controlling of PDFBOX's setSuppressDuplicateOverlappingText from PDFParser by JIRA jira@apache.org
2
by JIRA jira@apache.org
Assist please by NDIAYE Bacar
0
by NDIAYE Bacar
[jira] [Commented] (TIKA-369) Improve accuracy of language detection by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (TIKA-369) Improve accuracy of language detection by JIRA jira@apache.org
0
by JIRA jira@apache.org
Embed and ExifTool Contributions by Ray Gauss II
0
by Ray Gauss II
[jira] [Commented] (TIKA-369) Improve accuracy of language detection by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (TIKA-513) Support of Deja Vu (DjVu) format by JIRA jira@apache.org
0
by JIRA jira@apache.org
Tika 1.0 RC? by Mattmann, Chris A (3...
7
by Jukka Zitting
[jira] [Created] (TIKA-764) OpenDocumentMetaParser should use common metadata keys for document statistics by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] [Created] (TIKA-769) Upgrade to Commons Compress 1.3 by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] [Created] (TIKA-768) Parser for EDF files by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Created] (TIKA-763) Update license metadata by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] [Created] (TIKA-765) add icu dependency by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Created] (TIKA-761) Provide version number by CLI argument -V by JIRA jira@apache.org
25
by JIRA jira@apache.org
location of pdfbox in sources of Tika by ahmad ajiloo
3
by Oleg Tikhonov
Build failed in Jenkins: Tika-trunk #703 by Apache Jenkins Serve...
1
by Apache Jenkins Serve...
Build failed in Jenkins: Tika-trunk » Apache Tika OSGi bundle #703 by Apache Jenkins Serve...
2
by Apache Jenkins Serve...
[jira] Created: (TIKA-565) Improved OSGi bundling by JIRA jira@apache.org
6
by JIRA jira@apache.org
Build failed in Jenkins: Tika-trunk #696 by Apache Jenkins Serve...
1
by Apache Jenkins Serve...
[jira] [Created] (TIKA-736) OpenOffice parser: master footer text isn't extracted by JIRA jira@apache.org
12
by JIRA jira@apache.org
[jira] [Created] (TIKA-703) Drop deprecated methods/classes/interfaces by JIRA jira@apache.org
3
by JIRA jira@apache.org
Build failed in Jenkins: Tika-trunk #692 by Apache Jenkins Serve...
1
by Apache Jenkins Serve...
[jira] Created: (TIKA-582) Lithuanian language identification by JIRA jira@apache.org
11
by JIRA jira@apache.org
Google's Compact Language Detector by Jérôme Charron
11
by reinhard
Tika is waiting for ODFToolkit to improve ODF file format processing by Devin Han
4
by Michael McCandless-2
[jira] [Created] (TIKA-746) Support custom mime types by JIRA jira@apache.org
3
by JIRA jira@apache.org
failure in parsing pdf files with tika 0.9 with nutch 1.3 by digho
1
by Nick Burch-4
[jira] [Commented] (TIKA-245) Support of CHM Format by JIRA jira@apache.org
1
by Oleg Tikhonov-2
DZone article on Tika by Mattmann, Chris A (3...
0
by Mattmann, Chris A (3...
[jira] [Created] (TIKA-759) Better handling of content type metadata by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] [Created] (TIKA-753) Improve performance when parsing embedded Office docs by JIRA jira@apache.org
6
by JIRA jira@apache.org
[jira] [Created] (TIKA-755) Add getDetector() method to TikaConfig by JIRA jira@apache.org
3
by JIRA jira@apache.org
[jira] [Created] (TIKA-718) PDF bookmark text isn't extracted by JIRA jira@apache.org
2
by JIRA jira@apache.org
1 ... 500501502503504505506 ... 555