[jira] [Created] (TIKA-3097) Out of memory while parsing docx

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Created] (TIKA-3097) Out of memory while parsing docx

Mihir Sharma (Jira)
suchendra created TIKA-3097:

             Summary: Out of memory while parsing docx
                 Key: TIKA-3097
                 URL: https://issues.apache.org/jira/browse/TIKA-3097
             Project: Tika
          Issue Type: Bug
          Components: core, parser
    Affects Versions: 1.24
            Reporter: suchendra
         Attachments: test.docx

I have written simple Scala code to extract the content from uploaded file which is docx. JVM goes OOM when tika tries to parse the file. I have configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both with jar as well as in my code.
Attached the file for reference.

This message was sent by Atlassian Jira