Solr is not designed to store chunks of binary data. This is not a bug. It will probably not be “fixed”.
I strongly recommend putting your chunks of data in a database. Then store the primary key in a field in Solr. When the Solr results are returned, the client code can use the keys to fetch the data blobs from the database.
Please do not cross-post, this thread is for the users mailing list, not dev.
You have got the answer several times already: clean your input data. You obviously parse some pdf that contains bad data that result in one single token (word) being >32kb. Clean your input data either in your application or with Update Processor or TokenFilter in Solr.
Thank you Erick Erickson ,Bernd Fehling , Jan Hoydahl for your suggested solutions. I’ve tried the suggested one’s and still we are unable to import files having size >32 kb, it is displaying same error.
Below link has the suggested solutions. Please have a look once.
We are currently using Solr 4.2.1 version in our project and everything is going well. But recently, we are facing an issue with Solr Data Import. It is not importing the files with size greater than 32766 bytes (i.e, 32 kb) and
showing 2 exceptions:
Please find the attached screenshot for reference.
We have searched for solutions in many forums and didn’t find the exact solution for this issue. Interestingly, we found in the article, by changing the type of the ‘field’ from sting to ‘text_general’ might solve the issue.
Please have a look in the below forum: