Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Edwin Smith-3
Oh, and in case it matters, I'm using Lucene 2.2.0.

Ed



----- Original Message ----


I am stumped and have not seen any other reference to this problem. I am getting the following exception on everything I try to index. Does anyone know what my problem might be?

Thanks,

Ed

java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.analysis.standard.FastCharStream.readChar(
at org.apache.lucene.analysis.standard.FastCharStream.BeginToken(
at org.apache.lucene.analysis.standard.StandardTokenizerTokenManager.getNextToken(
at org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk(
at org.apache.lucene.analysis.standard.StandardTokenizer.next(
at org.apache.lucene.analysis.standard.StandardFilter.next(
at org.apache.lucene.analysis.LowerCaseFilter.next(
at org.apache.lucene.analysis.StopFilter.next(
at org.apache.lucene.index.DocumentWriter.invertDocument(
at org.apache.lucene.index.DocumentWriter.addDocument(
at org.apache.lucene.index.IndexWriter.buildSingleDocSegment(
at org.apache.lucene.index.IndexWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at com.affinovate.v4.server.search.Indexer.index(
at com.affinovate.v4.server.search.Indexer.perform(
at com.affinovate.v4.server.db.TaskQueue.run(
at java.lang.Thread.run(: 2048FastCharStream.java:46)FastCharStream.java:79)StandardTokenizerTokenManager.java:1180)StandardTokenizer.java:158)StandardTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java:33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter.java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWriter.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115)Thread.java:619)
Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar

steve_rowe
Hi Edwin,

I don't know specifically what's causing the exception you're seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version of StandardTokenizer (where your exception originates) has been replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966>.

FYI, indexing speed was much improved in 2.3.0 over previous versions -- up to 10 times faster, according to reports on this list -- is there any particular reason you aren't using 2.3.2 (the most recent release)?

Steve

On 10/06/2008 at 2:32 PM, Edwin Smith wrote:

> Oh, and in case it matters, I'm using Lucene 2.2.0.
>
> Ed
>
>
>
> ----- Original Message ----
>
>
> I am stumped and have not seen any other reference to this
> problem. I am getting the following exception on everything I
> try to index. Does anyone know what my problem might be?
>
> Thanks,
>
> Ed
>
> java.lang.ArrayIndexOutOfBoundsException at
> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
> ger.getNextToken( at
> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
> org.apache.lucene.analysis.standard.StandardFilter.next( at
> org.apache.lucene.analysis.LowerCaseFilter.next( at
> org.apache.lucene.analysis.StopFilter.next( at
> org.apache.lucene.index.DocumentWriter.invertDocument( at
> org.apache.lucene.index.DocumentWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> com.affinovate.v4.server.search.Indexer.index( at
> com.affinovate.v4.server.search.Indexer.perform( at
> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
> > 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
> )Thread.java:619)
>

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Edwin Smith-3
In reply to this post by Edwin Smith-3
No particular reason. It is just what I had loaded last and hadn't upgraded. It sounds like there might be good reason to do that now.
 
Thanks for the tip.
 
Ed



----- Original Message ----
From: Steven A Rowe <[hidden email]>
To: [hidden email]
Sent: Monday, October 6, 2008 3:18:20 PM
Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Hi Edwin,

I don't know specifically what's causing the exception you're seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version of StandardTokenizer (where your exception originates) has been replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966>.

FYI, indexing speed was much improved in 2.3.0 over previous versions -- up to 10 times faster, according to reports on this list -- is there any particular reason you aren't using 2.3.2 (the most recent release)?

Steve

On 10/06/2008 at 2:32 PM, Edwin Smith wrote:

> Oh, and in case it matters, I'm using Lucene 2.2.0.
>
> Ed
>
>
>
> ----- Original Message ----
>
>
> I am stumped and have not seen any other reference to this
> problem. I am getting the following exception on everything I
> try to index. Does anyone know what my problem might be?
>
> Thanks,
>
> Ed
>
> java.lang.ArrayIndexOutOfBoundsException at
> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
> ger.getNextToken( at
> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
> org.apache.lucene.analysis.standard.StandardFilter.next( at
> org.apache.lucene.analysis.LowerCaseFilter.next( at
> org.apache.lucene.analysis.StopFilter.next( at
> org.apache.lucene.index.DocumentWriter.invertDocument( at
> org.apache.lucene.index.DocumentWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> com.affinovate.v4.server.search.Indexer.index( at
> com.affinovate.v4.server.search.Indexer.perform( at
> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
> > 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
> )Thread.java:619)
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Edwin Smith-3
In reply to this post by Edwin Smith-3
I upgraded to the latest, 3.3.2 and had the same problem, even though it was clearly a different lexer reading the text.
 
I did find some problems with the reader I was using, and it now reads some files that it didn't before, so it may still be some reader problem I haven't identified, but the text coming in from it looks correct to me, so I don't know.
 
Very frustrating.
 
Ed



----- Original Message ----
From: Edwin Smith <[hidden email]>
To: [hidden email]
Sent: Monday, October 6, 2008 3:20:51 PM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

No particular reason. It is just what I had loaded last and hadn't upgraded. It sounds like there might be good reason to do that now.
 
Thanks for the tip.
 
Ed



----- Original Message ----
From: Steven A Rowe <[hidden email]>
To: [hidden email]
Sent: Monday, October 6, 2008 3:18:20 PM
Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Hi Edwin,

I don't know specifically what's causing the exception you're seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version of StandardTokenizer (where your exception originates) has been replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966>.

FYI, indexing speed was much improved in 2.3.0 over previous versions -- up to 10 times faster, according to reports on this list -- is there any particular reason you aren't using 2.3.2 (the most recent release)?

Steve

On 10/06/2008 at 2:32 PM, Edwin Smith wrote:

> Oh, and in case it matters, I'm using Lucene 2.2.0.
>
> Ed
>
>
>
> ----- Original Message ----
>
>
> I am stumped and have not seen any other reference to this
> problem. I am getting the following exception on everything I
> try to index. Does anyone know what my problem might be?
>
> Thanks,
>
> Ed
>
> java.lang.ArrayIndexOutOfBoundsException at
> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
> ger.getNextToken( at
> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
> org.apache.lucene.analysis.standard.StandardFilter.next( at
> org.apache.lucene.analysis.LowerCaseFilter.next( at
> org.apache.lucene.analysis.StopFilter.next( at
> org.apache.lucene.index.DocumentWriter.invertDocument( at
> org.apache.lucene.index.DocumentWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> org.apache.lucene.index.IndexWriter.addDocument( at
> com.affinovate.v4.server.search.Indexer.index( at
> com.affinovate.v4.server.search.Indexer.perform( at
> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
> > 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
> )Thread.java:619)
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Michael McCandless-2

If you capture the exact text produced by the reader, and wrap it in a  
StringReader and pass that to StandardAnalyzer, do you then see the  
same exception?

Can you post the full stack trace on 2.3.2?

Mike

Edwin Smith wrote:

> I upgraded to the latest, 3.3.2 and had the same problem, even  
> though it was clearly a different lexer reading the text.
>
> I did find some problems with the reader I was using, and it now  
> reads some files that it didn't before, so it may still be some  
> reader problem I haven't identified, but the text coming in from it  
> looks correct to me, so I don't know.
>
> Very frustrating.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Edwin Smith <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:20:51 PM
> Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> No particular reason. It is just what I had loaded last and hadn't  
> upgraded. It sounds like there might be good reason to do that now.
>
> Thanks for the tip.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Steven A Rowe <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:18:20 PM
> Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> Hi Edwin,
>
> I don't know specifically what's causing the exception you're  
> seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version  
> of StandardTokenizer (where your exception originates) has been  
> replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966 
> >.
>
> FYI, indexing speed was much improved in 2.3.0 over previous  
> versions -- up to 10 times faster, according to reports on this list  
> -- is there any particular reason you aren't using 2.3.2 (the most  
> recent release)?
>
> Steve
>
> On 10/06/2008 at 2:32 PM, Edwin Smith wrote:
>> Oh, and in case it matters, I'm using Lucene 2.2.0.
>>
>> Ed
>>
>>
>>
>> ----- Original Message ----
>>
>>
>> I am stumped and have not seen any other reference to this
>> problem. I am getting the following exception on everything I
>> try to index. Does anyone know what my problem might be?
>>
>> Thanks,
>>
>> Ed
>>
>> java.lang.ArrayIndexOutOfBoundsException at
>> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
>> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
>> ger.getNextToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
>> org.apache.lucene.analysis.standard.StandardFilter.next( at
>> org.apache.lucene.analysis.LowerCaseFilter.next( at
>> org.apache.lucene.analysis.StopFilter.next( at
>> org.apache.lucene.index.DocumentWriter.invertDocument( at
>> org.apache.lucene.index.DocumentWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> com.affinovate.v4.server.search.Indexer.index( at
>> com.affinovate.v4.server.search.Indexer.perform( at
>> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
>> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
>> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
>> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
>>> 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
>> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
>> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
>> )Thread.java:619)
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Edwin Smith-3
In reply to this post by Edwin Smith-3
Thanks for the tip. I tried your experiment and, sure enough, it works just fine, so it's not the contents but obviously some other behavior of my custom reader. (Does the analyzer require that "mark" and "reset" be implemented, for example? I did not implement them.)

The stack trace is as follows:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsExceptionat java.lang.System.arraycopy(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(
at org.apache.lucene.analysis.standard.StandardTokenizer.next(
at org.apache.lucene.analysis.standard.StandardFilter.next(
at org.apache.lucene.analysis.LowerCaseFilter.next(
at org.apache.lucene.analysis.StopFilter.next(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(
at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(
at org.apache.lucene.index.DocumentsWriter.updateDocument(
at org.apache.lucene.index.DocumentsWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at com.affinovate.v4.server.search.ServerTest.main(Native Method)StandardTokenizerImpl.java:366)StandardTokenizerImpl.java:573)StandardTokenizer.java:139)StandardFilter.java:42)LowerCaseFilter.java:33)StopFilter.java:118)DocumentsWriter.java:1522)DocumentsWriter.java:1412)DocumentsWriter.java:1121)DocumentsWriter.java:2442)DocumentsWriter.java:2424)IndexWriter.java:1464)IndexWriter.java:1442)ServerTest.java:36)

The error is occuring in the second line of zzRefill():

System.arraycopy(
 
I set a breakpoint to catch it before it erred, and the value of zzEndRead is 0 and the value of zzStartRead 1. Thus the error.
 
I was being clever and made a custom reader using competing threads against a SAX parser. I probbably did it more to see if I could than for any valid reason, so I will probably just simplify my approach and use the parser to pull a complete string and use a string reader like you suggest.
 
Thanks for the help.
 
Ed
zzBuffer, zzStartRead,                                              zzBuffer, 0,                                              zzEndRead-zzStartRead);


----- Original Message ----
From: Michael McCandless <[hidden email]>
To: [hidden email]
Sent: Tuesday, October 7, 2008 5:12:05 AM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar


If you capture the exact text produced by the reader, and wrap it in a 
StringReader and pass that to StandardAnalyzer, do you then see the 
same exception?

Can you post the full stack trace on 2.3.2?

Mike

Edwin Smith wrote:

> I upgraded to the latest, 3.3.2 and had the same problem, even 
> though it was clearly a different lexer reading the text.
>
> I did find some problems with the reader I was using, and it now 
> reads some files that it didn't before, so it may still be some 
> reader problem I haven't identified, but the text coming in from it 
> looks correct to me, so I don't know.
>
> Very frustrating.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Edwin Smith <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:20:51 PM
> Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> No particular reason. It is just what I had loaded last and hadn't 
> upgraded. It sounds like there might be good reason to do that now.
>
> Thanks for the tip.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Steven A Rowe <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:18:20 PM
> Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> Hi Edwin,
>
> I don't know specifically what's causing the exception you're 
> seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version 
> of StandardTokenizer (where your exception originates) has been 
> replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966 
> >.
>
> FYI, indexing speed was much improved in 2.3.0 over previous 
> versions -- up to 10 times faster, according to reports on this list 
> -- is there any particular reason you aren't using 2.3.2 (the most 
> recent release)?
>
> Steve
>
> On 10/06/2008 at 2:32 PM, Edwin Smith wrote:
>> Oh, and in case it matters, I'm using Lucene 2.2.0.
>>
>> Ed
>>
>>
>>
>> ----- Original Message ----
>>
>>
>> I am stumped and have not seen any other reference to this
>> problem. I am getting the following exception on everything I
>> try to index. Does anyone know what my problem might be?
>>
>> Thanks,
>>
>> Ed
>>
>> java.lang.ArrayIndexOutOfBoundsException at
>> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
>> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
>> ger.getNextToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
>> org.apache.lucene.analysis.standard.StandardFilter.next( at
>> org.apache.lucene.analysis.LowerCaseFilter.next( at
>> org.apache.lucene.analysis.StopFilter.next( at
>> org.apache.lucene.index.DocumentWriter.invertDocument( at
>> org.apache.lucene.index.DocumentWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> com.affinovate.v4.server.search.Indexer.index( at
>> com.affinovate.v4.server.search.Indexer.perform( at
>> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
>> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
>> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
>> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
>>> 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
>> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
>> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
>> )Thread.java:619)
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Edwin Smith-3
In reply to this post by Edwin Smith-3
I found it. My reader was returning 0 at the end of the stream instead of -1. Doh.
 
Thanks again for the suggestions. They did ultimately lead me to the right answer.
 
Ed



----- Original Message ----
From: Edwin Smith <[hidden email]>
To: [hidden email]
Sent: Tuesday, October 7, 2008 10:43:06 AM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar

Thanks for the tip. I tried your experiment and, sure enough, it works just fine, so it's not the contents but obviously some other behavior of my custom reader. (Does the analyzer require that "mark" and "reset" be implemented, for example? I did not implement them.)

The stack trace is as follows:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsExceptionat java.lang.System.arraycopy(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(
at org.apache.lucene.analysis.standard.StandardTokenizer.next(
at org.apache.lucene.analysis.standard.StandardFilter.next(
at org.apache.lucene.analysis.LowerCaseFilter.next(
at org.apache.lucene.analysis.StopFilter.next(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(
at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(
at org.apache.lucene.index.DocumentsWriter.updateDocument(
at org.apache.lucene.index.DocumentsWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at com.affinovate.v4.server.search.ServerTest.main(Native Method)StandardTokenizerImpl.java:366)StandardTokenizerImpl.java:573)StandardTokenizer.java:139)StandardFilter.java:42)LowerCaseFilter.java:33)StopFilter.java:118)DocumentsWriter.java:1522)DocumentsWriter.java:1412)DocumentsWriter.java:1121)DocumentsWriter.java:2442)DocumentsWriter.java:2424)IndexWriter.java:1464)IndexWriter.java:1442)ServerTest.java:36)

The error is occuring in the second line of zzRefill():

System.arraycopy(
 
I set a breakpoint to catch it before it erred, and the value of zzEndRead is 0 and the value of zzStartRead 1. Thus the error.
 
I was being clever and made a custom reader using competing threads against a SAX parser. I probbably did it more to see if I could than for any valid reason, so I will probably just simplify my approach and use the parser to pull a complete string and use a string reader like you suggest.
 
Thanks for the help.
 
Ed
zzBuffer, zzStartRead,                                              zzBuffer, 0,                                              zzEndRead-zzStartRead);


----- Original Message ----
From: Michael McCandless <[hidden email]>
To: [hidden email]
Sent: Tuesday, October 7, 2008 5:12:05 AM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar


If you capture the exact text produced by the reader, and wrap it in a 
StringReader and pass that to StandardAnalyzer, do you then see the 
same exception?

Can you post the full stack trace on 2.3.2?

Mike

Edwin Smith wrote:

> I upgraded to the latest, 3.3.2 and had the same problem, even 
> though it was clearly a different lexer reading the text.
>
> I did find some problems with the reader I was using, and it now 
> reads some files that it didn't before, so it may still be some 
> reader problem I haven't identified, but the text coming in from it 
> looks correct to me, so I don't know.
>
> Very frustrating.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Edwin Smith <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:20:51 PM
> Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> No particular reason. It is just what I had loaded last and hadn't 
> upgraded. It sounds like there might be good reason to do that now.
>
> Thanks for the tip.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Steven A Rowe <[hidden email]>
> To: [hidden email]
> Sent: Monday, October 6, 2008 3:18:20 PM
> Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> Hi Edwin,
>
> I don't know specifically what's causing the exception you're 
> seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version 
> of StandardTokenizer (where your exception originates) has been 
> replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966 
> >.
>
> FYI, indexing speed was much improved in 2.3.0 over previous 
> versions -- up to 10 times faster, according to reports on this list 
> -- is there any particular reason you aren't using 2.3.2 (the most 
> recent release)?
>
> Steve
>
> On 10/06/2008 at 2:32 PM, Edwin Smith wrote:
>> Oh, and in case it matters, I'm using Lucene 2.2.0.
>>
>> Ed
>>
>>
>>
>> ----- Original Message ----
>>
>>
>> I am stumped and have not seen any other reference to this
>> problem. I am getting the following exception on everything I
>> try to index. Does anyone know what my problem might be?
>>
>> Thanks,
>>
>> Ed
>>
>> java.lang.ArrayIndexOutOfBoundsException at
>> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
>> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
>> ger.getNextToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
>> org.apache.lucene.analysis.standard.StandardFilter.next( at
>> org.apache.lucene.analysis.LowerCaseFilter.next( at
>> org.apache.lucene.analysis.StopFilter.next( at
>> org.apache.lucene.index.DocumentWriter.invertDocument( at
>> org.apache.lucene.index.DocumentWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> com.affinovate.v4.server.search.Indexer.index( at
>> com.affinovate.v4.server.search.Indexer.perform( at
>> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
>> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
>> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
>> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
>>> 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
>> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
>> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
>> )Thread.java:619)
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]