Tika Tesseract configuration

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Tika Tesseract configuration

Aditya Dhulipala
Tika Devs!

I'm trying to run Tika with Tesseract.
I finished installing tesseract and confirmed that its working correctly.

I ran an image against Tika server expecting that tesseractOCR would be enabled by default.

But I noticed that the extracted metadata didn't have OCR output.

Is this because tesseract is disabled by default?

Should there be a TesseractConfig.properties files somewhere? (I read about this in the TesseractOCRParser source. But I didn't find this file anywhere)


Hi 

Thanks!
--
Aditya 



Reply | Threaded
Open this post in threaded view
|

Re: Tika Tesseract configuration

Aditya Dhulipala
Hi Tika devs,

Scratch that previous email.

I found the TesseractOCRConfig .properties file

I was looking for it in the wrong location.

Sorry for the confusion.

Thanks!
--
Aditya


adi

On Wed, Oct 14, 2015 at 9:52 AM, Aditya Dhulipala <[hidden email]> wrote:
Tika Devs!

I'm trying to run Tika with Tesseract.
I finished installing tesseract and confirmed that its working correctly.

I ran an image against Tika server expecting that tesseractOCR would be enabled by default.

But I noticed that the extracted metadata didn't have OCR output.

Is this because tesseract is disabled by default?

Should there be a TesseractConfig.properties files somewhere? (I read about this in the TesseractOCRParser source. But I didn't find this file anywhere)


Hi 

Thanks!
--
Aditya 




Reply | Threaded
Open this post in threaded view
|

Re: Tika Tesseract configuration

Tyler Palsulich
Hi Aditya,

The wiki (https://wiki.apache.org/tika/TikaOCR) also had some good
information about setting up and configuring Tesseract.

Let me know if you have any questions.

Thanks,
Tyler

On Wed, Oct 14, 2015, 6:59 AM Aditya Dhulipala <[hidden email]> wrote:

> Hi Tika devs,
>
> Scratch that previous email.
>
> I found the TesseractOCRConfig .properties file
>
> I was looking for it in the wrong location.
>
> Sorry for the confusion.
>
> Thanks!
> --
> Aditya
>
>
> adi
>
> On Wed, Oct 14, 2015 at 9:52 AM, Aditya Dhulipala <[hidden email]>
> wrote:
>
>> Tika Devs!
>>
>> I'm trying to run Tika with Tesseract.
>> I finished installing tesseract and confirmed that its working correctly.
>>
>> I ran an image against Tika server expecting that tesseractOCR would be
>> enabled by default.
>>
>> But I noticed that the extracted metadata didn't have OCR output.
>>
>> Is this because tesseract is disabled by default?
>>
>> Should there be a TesseractConfig.properties files somewhere? (I read
>> about this in the TesseractOCRParser source. But I didn't find this file
>> anywhere)
>>
>>
>> [image: Inline image 1]Hi
>>
>> Thanks!
>> --
>> Aditya
>>
>>
>>
>>
>