TabularFormatsTest test fails in Germany

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

TabularFormatsTest test fails in Germany

Tilman Hausherr
So I wanted to build tika from source, and failed:

Failures:
   TabularFormatsTest.testSAS7BDAT:229->assertContents:216 en_US Wrong
text in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
03Mär1963:09:46:40.00
   TabularFormatsTest.testXLS:236->assertContents:216 en_US Wrong text
in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
03Mär63 09:46:40
   TabularFormatsTest.testXLSB:250->assertContents:216 en_US Wrong text
in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
03Mär63 09:46:40
   TabularFormatsTest.testXLSX:243->assertContents:216 en_US Wrong text
in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
03Mär63 09:46:40

It fails because the expected "Mar" is not identical to "Mär". I tried
to set the Locale to the US

     @Before
     public void setUp()
     {
         Locale.setDefault(Locale.US);
     }

but this works only when the test is run alone, not if the whole build
is running, despite that the Locale is set. See the output above, I have
changed the assert to

assertTrue(Locale.getDefault() + " " + error,
((Pattern)table[cn][rn]).matcher(val).matches());

A possible solution would be to change the test file to have June
instead of March, but we could still get in trouble e.g. in Russia,
China, Korea, Thailand, Japan, ....

Tilman

Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tim Allison
Would it work to set the expected String to something generated with the
root locale?

On Fri, Oct 4, 2019 at 10:56 AM Tilman Hausherr <[hidden email]>
wrote:

> So I wanted to build tika from source, and failed:
>
> Failures:
>    TabularFormatsTest.testSAS7BDAT:229->assertContents:216 en_US Wrong
> text in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> 03Mär1963:09:46:40.00
>    TabularFormatsTest.testXLS:236->assertContents:216 en_US Wrong text
> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> 03Mär63 09:46:40
>    TabularFormatsTest.testXLSB:250->assertContents:216 en_US Wrong text
> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> 03Mär63 09:46:40
>    TabularFormatsTest.testXLSX:243->assertContents:216 en_US Wrong text
> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> 03Mär63 09:46:40
>
> It fails because the expected "Mar" is not identical to "Mär". I tried
> to set the Locale to the US
>
>      @Before
>      public void setUp()
>      {
>          Locale.setDefault(Locale.US);
>      }
>
> but this works only when the test is run alone, not if the whole build
> is running, despite that the Locale is set. See the output above, I have
> changed the assert to
>
> assertTrue(Locale.getDefault() + " " + error,
> ((Pattern)table[cn][rn]).matcher(val).matches());
>
> A possible solution would be to change the test file to have June
> instead of March, but we could still get in trouble e.g. in Russia,
> China, Korea, Thailand, Japan, ....
>
> Tilman
>
>
Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tilman Hausherr
Am 04.10.2019 um 17:32 schrieb Tim Allison:
> Would it work to set the expected String to something generated with the
> root locale?

Yes, that makes sense.

But I'm wondering whether this is a configuration problem - am I the
first one outside the US who tried doing a build from source?

Tilman



>
> On Fri, Oct 4, 2019 at 10:56 AM Tilman Hausherr <[hidden email]>
> wrote:
>
>> So I wanted to build tika from source, and failed:
>>
>> Failures:
>>     TabularFormatsTest.testSAS7BDAT:229->assertContents:216 en_US Wrong
>> text in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
>> 03Mär1963:09:46:40.00
>>     TabularFormatsTest.testXLS:236->assertContents:216 en_US Wrong text
>> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
>> 03Mär63 09:46:40
>>     TabularFormatsTest.testXLSB:250->assertContents:216 en_US Wrong text
>> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
>> 03Mär63 09:46:40
>>     TabularFormatsTest.testXLSX:243->assertContents:216 en_US Wrong text
>> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
>> 03Mär63 09:46:40
>>
>> It fails because the expected "Mar" is not identical to "Mär". I tried
>> to set the Locale to the US
>>
>>       @Before
>>       public void setUp()
>>       {
>>           Locale.setDefault(Locale.US);
>>       }
>>
>> but this works only when the test is run alone, not if the whole build
>> is running, despite that the Locale is set. See the output above, I have
>> changed the assert to
>>
>> assertTrue(Locale.getDefault() + " " + error,
>> ((Pattern)table[cn][rn]).matcher(val).matches());
>>
>> A possible solution would be to change the test file to have June
>> instead of March, but we could still get in trouble e.g. in Russia,
>> China, Korea, Thailand, Japan, ....
>>
>> Tilman
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tim Allison
Y.  It could be a configuration problem.  I agree that something weird
is going on in that you're getting a failure with the full build but
everything works ok with the local setting of Locale.

I think the full solution would allow users to pass in Locale via
ParseContext...and that _might_ work with some parsers now?  I'm not
sure...that would take some work to get all Parsers to support that,
but that would make testing more straightforward.

As for the question of are you the only one outside of the U.S.?  We
do have committers around the world, but we need more, as you're
finding.  Thank you for your patience!

On Fri, Oct 4, 2019 at 11:48 AM Tilman Hausherr <[hidden email]> wrote:

>
> Am 04.10.2019 um 17:32 schrieb Tim Allison:
> > Would it work to set the expected String to something generated with the
> > root locale?
>
> Yes, that makes sense.
>
> But I'm wondering whether this is a configuration problem - am I the
> first one outside the US who tried doing a build from source?
>
> Tilman
>
>
>
> >
> > On Fri, Oct 4, 2019 at 10:56 AM Tilman Hausherr <[hidden email]>
> > wrote:
> >
> >> So I wanted to build tika from source, and failed:
> >>
> >> Failures:
> >>     TabularFormatsTest.testSAS7BDAT:229->assertContents:216 en_US Wrong
> >> text in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> >> 03Mär1963:09:46:40.00
> >>     TabularFormatsTest.testXLS:236->assertContents:216 en_US Wrong text
> >> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> >> 03Mär63 09:46:40
> >>     TabularFormatsTest.testXLSB:250->assertContents:216 en_US Wrong text
> >> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> >> 03Mär63 09:46:40
> >>     TabularFormatsTest.testXLSX:243->assertContents:216 en_US Wrong text
> >> in row 9 and column 7 - 03(MAR|Mar)(63|1963)[:\s]09:46:40(.00)? vs
> >> 03Mär63 09:46:40
> >>
> >> It fails because the expected "Mar" is not identical to "Mär". I tried
> >> to set the Locale to the US
> >>
> >>       @Before
> >>       public void setUp()
> >>       {
> >>           Locale.setDefault(Locale.US);
> >>       }
> >>
> >> but this works only when the test is run alone, not if the whole build
> >> is running, despite that the Locale is set. See the output above, I have
> >> changed the assert to
> >>
> >> assertTrue(Locale.getDefault() + " " + error,
> >> ((Pattern)table[cn][rn]).matcher(val).matches());
> >>
> >> A possible solution would be to change the test file to have June
> >> instead of March, but we could still get in trouble e.g. in Russia,
> >> China, Korea, Thailand, Japan, ....
> >>
> >> Tilman
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tilman Hausherr
In reply to this post by Tim Allison
Am 04.10.2019 um 17:32 schrieb Tim Allison:
> Would it work to set the expected String to something generated with the
> root locale?

I managed to create such code: we need both the US and the local month
names. I get these with

     new DateFormatSymbols(Locale.getDefault()).getShortMonths()

Because testCSV() always brings the english names, the others the local
names.

I understand that the code conventions mean that braces are K&R style.

Is there a max line length in this project? I ask this because it is
very long.

Is there also a branch or is there just the master to commit to? Should
I just commit to the server (
https://git-wip-us.apache.org/repos/asf/tika ) or is this done in a
different way? Any common pitfalls?

Tilman

Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tim Allison
Thank you, Tilman!

The master branch is for 2.x and branch_1x is for 1.x

On Sat, Oct 5, 2019 at 10:30 AM Tilman Hausherr <[hidden email]>
wrote:

> Am 04.10.2019 um 17:32 schrieb Tim Allison:
> > Would it work to set the expected String to something generated with the
> > root locale?
>
> I managed to create such code: we need both the US and the local month
> names. I get these with
>
>      new DateFormatSymbols(Locale.getDefault()).getShortMonths()
>
> Because testCSV() always brings the english names, the others the local
> names.
>
> I understand that the code conventions mean that braces are K&R style.
>
> Is there a max line length in this project? I ask this because it is
> very long.
>
> Is there also a branch or is there just the master to commit to? Should
> I just commit to the server (
> https://git-wip-us.apache.org/repos/asf/tika ) or is this done in a
> different way? Any common pitfalls?
>
> Tilman
>
>
Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tim Allison
In reply to this post by Tilman Hausherr
As for the repo, I work directly w GitHub, but wip should work.

I don’t think we’ve codified line length, but please do as you see fit.

I’ve started using Maven checkstyle on another project...I shudder to think
of the amount of time it would take to get our code into shape...if only I
had a few long plane rides in my future... :D

On Sat, Oct 5, 2019 at 10:30 AM Tilman Hausherr <[hidden email]>
wrote:

> Am 04.10.2019 um 17:32 schrieb Tim Allison:
> > Would it work to set the expected String to something generated with the
> > root locale?
>
> I managed to create such code: we need both the US and the local month
> names. I get these with
>
>      new DateFormatSymbols(Locale.getDefault()).getShortMonths()
>
> Because testCSV() always brings the english names, the others the local
> names.
>
> I understand that the code conventions mean that braces are K&R style.
>
> Is there a max line length in this project? I ask this because it is
> very long.
>
> Is there also a branch or is there just the master to commit to? Should
> I just commit to the server (
> https://git-wip-us.apache.org/repos/asf/tika ) or is this done in a
> different way? Any common pitfalls?
>
> Tilman
>
>
Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tilman Hausherr
Am 05.10.2019 um 17:46 schrieb Tim Allison:
> As for the repo, I work directly w GitHub, but wip should work.
>
> I don’t think we’ve codified line length, but please do as you see fit.

I decided to refactor the code and it is much better now and the code
line length question doesn't exist anymore, haha.

Tilman


>
> I’ve started using Maven checkstyle on another project...I shudder to think
> of the amount of time it would take to get our code into shape...if only I
> had a few long plane rides in my future... :D
>
> On Sat, Oct 5, 2019 at 10:30 AM Tilman Hausherr <[hidden email]>
> wrote:
>
>> Am 04.10.2019 um 17:32 schrieb Tim Allison:
>>> Would it work to set the expected String to something generated with the
>>> root locale?
>> I managed to create such code: we need both the US and the local month
>> names. I get these with
>>
>>       new DateFormatSymbols(Locale.getDefault()).getShortMonths()
>>
>> Because testCSV() always brings the english names, the others the local
>> names.
>>
>> I understand that the code conventions mean that braces are K&R style.
>>
>> Is there a max line length in this project? I ask this because it is
>> very long.
>>
>> Is there also a branch or is there just the master to commit to? Should
>> I just commit to the server (
>> https://git-wip-us.apache.org/repos/asf/tika ) or is this done in a
>> different way? Any common pitfalls?
>>
>> Tilman
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: TabularFormatsTest test fails in Germany

Tim Allison
+1

Thank you!

On Sat, Oct 5, 2019 at 1:15 PM Tilman Hausherr <[hidden email]>
wrote:

> Am 05.10.2019 um 17:46 schrieb Tim Allison:
> > As for the repo, I work directly w GitHub, but wip should work.
> >
> > I don’t think we’ve codified line length, but please do as you see fit.
>
> I decided to refactor the code and it is much better now and the code
> line length question doesn't exist anymore, haha.
>
> Tilman
>
>
> >
> > I’ve started using Maven checkstyle on another project...I shudder to
> think
> > of the amount of time it would take to get our code into shape...if only
> I
> > had a few long plane rides in my future... :D
> >
> > On Sat, Oct 5, 2019 at 10:30 AM Tilman Hausherr <[hidden email]>
> > wrote:
> >
> >> Am 04.10.2019 um 17:32 schrieb Tim Allison:
> >>> Would it work to set the expected String to something generated with
> the
> >>> root locale?
> >> I managed to create such code: we need both the US and the local month
> >> names. I get these with
> >>
> >>       new DateFormatSymbols(Locale.getDefault()).getShortMonths()
> >>
> >> Because testCSV() always brings the english names, the others the local
> >> names.
> >>
> >> I understand that the code conventions mean that braces are K&R style.
> >>
> >> Is there a max line length in this project? I ask this because it is
> >> very long.
> >>
> >> Is there also a branch or is there just the master to commit to? Should
> >> I just commit to the server (
> >> https://git-wip-us.apache.org/repos/asf/tika ) or is this done in a
> >> different way? Any common pitfalls?
> >>
> >> Tilman
> >>
> >>
>
>