Problem with Pdf, Sol 1.4.1 Cell

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Pdf, Sol 1.4.1 Cell

Alessandro Benedetti-4
Hi all,
as I saw in this discussion [1] there were many issues with PDF indexing in
Solr 1.4  due to TIka library (0.4 Version).
In Solr 1.4.1 the tika library is the same so I guess  the issues are the
same.
Could anyone, who contributed to the previous thread, help me in resolving
these issues?
I need a simple tutorial that could help me to upgrade Solr Cell!

Something like this:
1) download tika core from trunk
2)create jar with maven dependecies
3)unjar Sol 1.4.1 and change tika library
4)jar the patched Solr 1.4.1 and enjoy!

[1]
http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results

Best regards

--
--------------------------

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Pdf, Sol 1.4.1 Cell

Tommaso Teofili
Hi,
I think there is an open bug for it at:
https://issues.apache.org/jira/browse/SOLR-1902
Using Solr 1.4.1 and upgrading Tika libraries to 0.8 snapshot I had also to
upgrade pdfbox, fontbox and jembox to 1.2.1; I got no errors and it seems
it's able to index PDFs without any errors (I can query them by id:doc1 for
example) but did not extract text or other metadata from them.
Building a new Solr distribution from trunk (ant distr) and using Tika 0.8
snapshot (with pdfbox, fontbox and jebox 1.2.1) it seems it's working.
My 2 cents,
Tommaso

2010/7/23 Alessandro Benedetti <[hidden email]>

> Hi all,
> as I saw in this discussion [1] there were many issues with PDF indexing in
> Solr 1.4  due to TIka library (0.4 Version).
> In Solr 1.4.1 the tika library is the same so I guess  the issues are the
> same.
> Could anyone, who contributed to the previous thread, help me in resolving
> these issues?
> I need a simple tutorial that could help me to upgrade Solr Cell!
>
> Something like this:
> 1) download tika core from trunk
> 2)create jar with maven dependecies
> 3)unjar Sol 1.4.1 and change tika library
> 4)jar the patched Solr 1.4.1 and enjoy!
>
> [1]
>
> http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results
>
> Best regards
>
> --
> --------------------------
>
> Benedetti Alessandro
> Personal Page: http://tigerbolt.altervista.org
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>