[GitHub] [tika] pjfanning opened a new pull request #404: WIP: POI 5.0.0

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning opened a new pull request #404: WIP: POI 5.0.0

GitBox

pjfanning opened a new pull request #404:
URL: https://github.com/apache/tika/pull/404


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-783721207


   @tballison I tried this on my laptop - the tika-parser microsoft tests passed but job later failed with
   
   ```
   [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.181 s <<< FAILURE! - in org.apache.tika.parser.gdal.TestGDALParser
   [ERROR] org.apache.tika.parser.gdal.TestGDALParser.testParseBasicInfo  Time elapsed: 12.795 s  <<< FAILURE!
   java.lang.AssertionError
    at org.apache.tika.parser.gdal.TestGDALParser.testParseBasicInfo(TestGDALParser.java:82)
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784055893


   Sorry for my delay!  Not clear on why that test failed for you.  Let me take a look. Working on this today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784099481


   Aside from including the full schemas jar, is there a solution to this:
   ```[ERROR] Tests run: 13, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 0.548 s <<< FAILURE! - in org.apache.tika.parser.RecursiveParserWrapperTest
   [ERROR] org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded  Time elapsed: 0.16 s  <<< ERROR!
   org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@626c569b
    at org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded(RecursiveParserWrapperTest.java:191)
   Caused by: org.apache.xmlbeans.SchemaTypeLoaderException: XML-BEANS compiled schema: Could not locate compiled schema resource org/apache/poi/schemas/ooxml/system/ooxml/oleobjectelement.xsb (org.apache.poi.schemas.ooxml.system.ooxml.oleobjectelement) - code 0
    at org.apache.tika.parser.RecursiveParserWrapperTest.testMaxEmbedded(RecursiveParserWrapperTest.java:191)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784102052


   See: https://github.com/apache/tika/tree/TIKA-3164-1.x


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784110289


   I'm adding "test_recursive_embedded.docx" to a unit test in POI locally to see if I can get it to add the oleobjectelement.xsb in schemas-lite.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784114343


   there is the ooxml-schemas-full jar for cases where ooxml-schemas-lite is missing stuff
   
   I thought all the xsb stuff was in ooxml-schemas-lite jar
   
   definitely worth adding a test case to poi code base
   
   POI 6.0.0 is probably going to be next release and it could be a couple of months (fairly big logging changes just merged and probably an uptake of a refactored xmlbeans jar)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784115989


   Fresh checkout...
   
   ./gradlew build
   Results:
   ```> Task :ooxml:compileJava
   /home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFCell.java:564: error: cannot access DocumentFactory
               f = CTCellFormula.Factory.newInstance();
                                        ^
     class file for org.apache.xmlbeans.impl.schema.DocumentFactory not found
   /home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFColor.java:117: error: recursive constructor invocation
       public XSSFColor(byte[] rgb, IndexedColorMap colorMap) {
              ^
   /home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xddf/usermodel/XDDFLineProperties.java:42: error: recursive constructor invocation
       public XDDFLineProperties(XDDFFillProperties fill) {
              ^
   /home/tallison/Intellij/poi-trunk/src/ooxml/java/org/apache/poi/xddf/usermodel/text/XDDFHyperlink.java:29: error: recursive constructor invocation
       public XDDFHyperlink(String id) {
   ```
   
   I'm guessing I need to run ant first to pull in the dependencies?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784117840


   the gradle build depends quite a bit on the ant one - I would suggest getting ant build working and then the gradle build will probably start working


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784128134


   Uninstalled old ant
   Installed new ant
   ```
   ant -f fetch.xml -Ddest=system
   ```
   
   ```
   echo ANT_HOME
   /apache/apache-ant-1.10.9
   ```
   ```
   ant -v
   Apache Ant(TM) version 1.10.9 compiled on September 27 2020
   ```
   
   ant clean test
   
   ```BUILD FAILED
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: /home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   
   Total time: 2 minutes 58 seconds
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison edited a comment on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784128134


   ```
   openjdk version "1.8.0_282"
   OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
   OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
   ```
   
   Ubuntu
   
   
   Uninstalled old ant
   Installed new ant
   ```
   ant -f fetch.xml -Ddest=system
   ```
   
   ```
   echo ANT_HOME
   /apache/apache-ant-1.10.9
   ```
   ```
   ant -v
   Apache Ant(TM) version 1.10.9 compiled on September 27 2020
   ```
   
   ant clean test
   
   ```BUILD FAILED
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: /home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   
   Total time: 2 minutes 58 seconds
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784129952


   I'm sure the above is user error.  I've been away from POI for too long...argh...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784152065


   @tballison would it be worth just using ooxml-schemas-full on tika - tika is big so the benefit of ooxml-schemas-lite is lower


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning edited a comment on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784152065


   @tballison would it be worth just using ooxml-schemas-full on tika? - tika is big so the benefit of ooxml-schemas-lite is lower


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784155288


   I'm happy to do so for testing, but I'm hesitant to add even more to tika.  The point of 2.x is to modularize and make dependencies smaller.  I wouldn't rule it out, necessarily...
   
   Any recs on the above build failure?  Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784156188


   I'd suggest a clean checkout - there could be some stuff hanging around that `ant clean` is not removing


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784167038


   Clean checkout.
   
   `ant compile` appears to work.
   
   `ant test` fails with:
   
   ```test-main:
       [javac] Compiling 1 source file to /home/tallison/Intellij/poi-trunk/build/poi-ant-contrib
   
   -test-main-write-testfile:
   
   -test-scratchpad-check:
   
   test-scratchpad-download-resources:
   
   test-scratchpad:
   
   -test-scratchpad-write-testfile:
   
   -test-ooxml-check:
   
   test-ooxml:
   
   -test-ooxml-write-testfile:
   
   compile-ooxml-lite:
        [echo] Create ooxml-lite schemas
   
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: /home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison edited a comment on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison edited a comment on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784167038


   Clean checkout.
   
   `ant compile` appears to work.
   
   `ant test` fails with:
   
   ```
    [echo] Using Ant: Apache Ant(TM) version 1.10.9 compiled on September 27 2020 from /apache/apache-ant-1.10.9, Ant detected Java 1.8 (may be different than actual Java sometimes...)
        [echo] Using Java: 1.8.0_282/1.8.0_282-b08/25.282-b08/OpenJDK 64-Bit Server VM from AdoptOpenJDK on Linux: 5.8.0-43-generic
        [echo] Building Apache POI version 5.0.1-SNAPSHOT and RC: RC1
   ....
   
   test-main:
       [javac] Compiling 1 source file to /home/tallison/Intellij/poi-trunk/build/poi-ant-contrib
   
   -test-main-write-testfile:
   
   -test-scratchpad-check:
   
   test-scratchpad-download-resources:
   
   test-scratchpad:
   
   -test-scratchpad-write-testfile:
   
   -test-ooxml-check:
   
   test-ooxml:
   
   -test-ooxml-write-testfile:
   
   compile-ooxml-lite:
        [echo] Create ooxml-lite schemas
   
   BUILD FAILED
   /home/tallison/Intellij/poi-trunk/build.xml:1812: /home/tallison/Intellij/poi-trunk/build/ooxml-lite-report.clazz doesn't exist
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] pjfanning commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

pjfanning commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784173672


   I'm not getting that issue - I'm using zulu jdk 11.0.7 and ant 1.10.8


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [tika] tballison commented on pull request #404: WIP: TIKA-3164: POI 5.0.0

GitBox
In reply to this post by GitBox

tballison commented on pull request #404:
URL: https://github.com/apache/tika/pull/404#issuecomment-784174494


   K.  jdk 8 _should_ work, right?  I'll ping the dev list.  Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12