Tesseract/Tika certain pages

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Tesseract/Tika certain pages

Peyman Faratin

I am a noobie to nutch. I am using version 1.15. What I would like to do is have tika ocr images, but only if the url matches some keywords. I am not sure how to go about configuring nutch to do either of these tasks.

Any help would be much appreciated.