[jira] [Created] (TIKA-3271) Change default image resize size in TesseractParser's pre-processing step

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (TIKA-3271) Change default image resize size in TesseractParser's pre-processing step

Steve Loughran (Jira)
Tim Allison created TIKA-3271:
---------------------------------

             Summary: Change default image resize size in TesseractParser's pre-processing step
                 Key: TIKA-3271
                 URL: https://issues.apache.org/jira/browse/TIKA-3271
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


If users have ImageMagick installed and they select image preprocessing, one of the things we are currently doing is telling ImageMagick to expand the image by 900%.  This _may_ make sense for small images..tbd...however, this can lead to massive files and dramatic increases in processing time.

At some point, we should probably increase the image size based on the initial image size, e.g. dynamic resizing.

Until then, for Tika 2.0.0, I propose that we change the default to 200%.  This value is completely heuristic and not based on much data aside from Peter Kronenberg's work: https://lists.apache.org/thread.html/rb1dece05760d10f1b165b03b97fef8b609dc40c4cd06bdb8cc36469d%40%3Cuser.tika.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)