Looking for PR code review for DWG parser changes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for PR code review for DWG parser changes

Nicholas DiPiazza
Looking for code review of:

https://github.com/apache/tika/pull/395

This addresses TIKA-1735 and it also adds the ability for the dwg parser to
utilize the LibreDWG library if it is configured.

The DWG reading code is much too vast and complex to hope to port to Java.
So similar to how we do tesseract, if DWGConfig.properties is present on
the classpath and contains a valid path to the dwgread executable, it will
call dwgread to extract text from DWG files.

In terms of unit tests - need some love there. Is there some way we can get
libre DWG installed on the jenkins server so that it can actually run the
tests that exercise dwgread?

-Nicholas
Reply | Threaded
Open this post in threaded view
|

Re: Looking for PR code review for DWG parser changes

Tim Allison
Nicholas,

  I'm really grateful for your PR.  Once I roll 2.0.0-ALPHA, I'll have time
to take a look.  I'm out a bit next week...so might not be until towards
the end of next week.

  If there are other devs who want to take this, please do.

  Please don't take my lack of response as a failure of gratitude. :)

Cheers,

              Tim

On Wed, Jan 13, 2021 at 11:28 AM Nicholas DiPiazza <
[hidden email]> wrote:

> Looking for code review of:
>
> https://github.com/apache/tika/pull/395
>
> This addresses TIKA-1735 and it also adds the ability for the dwg parser to
> utilize the LibreDWG library if it is configured.
>
> The DWG reading code is much too vast and complex to hope to port to Java.
> So similar to how we do tesseract, if DWGConfig.properties is present on
> the classpath and contains a valid path to the dwgread executable, it will
> call dwgread to extract text from DWG files.
>
> In terms of unit tests - need some love there. Is there some way we can get
> libre DWG installed on the jenkins server so that it can actually run the
> tests that exercise dwgread?
>
> -Nicholas
>
Reply | Threaded
Open this post in threaded view
|

Re: Looking for PR code review for DWG parser changes

Nicholas DiPiazza
Definitely take your time! No pressure from my end, and I appreciate all
that you do for this project!

On Wed, Jan 13, 2021 at 2:48 PM Tim Allison <[hidden email]> wrote:

> Nicholas,
>
>   I'm really grateful for your PR.  Once I roll 2.0.0-ALPHA, I'll have time
> to take a look.  I'm out a bit next week...so might not be until towards
> the end of next week.
>
>   If there are other devs who want to take this, please do.
>
>   Please don't take my lack of response as a failure of gratitude. :)
>
> Cheers,
>
>               Tim
>
> On Wed, Jan 13, 2021 at 11:28 AM Nicholas DiPiazza <
> [hidden email]> wrote:
>
> > Looking for code review of:
> >
> > https://github.com/apache/tika/pull/395
> >
> > This addresses TIKA-1735 and it also adds the ability for the dwg parser
> to
> > utilize the LibreDWG library if it is configured.
> >
> > The DWG reading code is much too vast and complex to hope to port to
> Java.
> > So similar to how we do tesseract, if DWGConfig.properties is present on
> > the classpath and contains a valid path to the dwgread executable, it
> will
> > call dwgread to extract text from DWG files.
> >
> > In terms of unit tests - need some love there. Is there some way we can
> get
> > libre DWG installed on the jenkins server so that it can actually run the
> > tests that exercise dwgread?
> >
> > -Nicholas
> >
>