ArrayIndexOutOfBounds exception using FieldCache

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

ArrayIndexOutOfBounds exception using FieldCache

karl.wright
Hi Folks,
 
I just tried to index a data set that was probably 2x as large as the previous one I’d been using with the same code.  The indexing completed fine, although it was slower than I would have liked. ;-)  But the following problem occurs when I try to use FieldCache to look up an indexed and stored value:
 
java.lang.ArrayIndexOutOfBoundsException: -65406
        at org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
        at org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
        at ...
 
The code that does this has been working for quite some time and has been unmodified:
 
    /** Find a string field value, given the lucene ID, field name, and value.
    */
    protected String getStringValue(int luceneID, String fieldName)
      throws IOException
    {
      // Find the right reader
      final int idx = readerIndex(luceneID, starts, readers.length);
      final int docBase = starts[idx];
      final IndexReader reader = readers[idx];
 
      BytesRef ref = FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new BytesRef());
      String rval = ref.utf8ToString();
      //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+" field "+fieldName+" with result '"+rval+"'");
      return rval;
    }
 
  }
 
I added a try/catch to see what values were going into the key line:
 
catch (RuntimeException e)
    {
        System.out.println("LuceneID = "+luceneID+", fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
        System.out.println("Readers = "+readers.length);
        int i = 0;
        while (i < readers.length)
            {
                System.out.println(" Reader start "+i+" is "+starts[i]);
                i++;
            }
        throw e;
    }
 
The resulting output was:
 
LuceneID = 34466856, fieldName='id', idx=0, docBase=0
Readers = 1
     Reader start 0 is 0
 
… which looks reasonable on the face of things.  This is a version of trunk from approximately 8/12/2010, so it is fairly old.  Was there a fix for a problem that could account for this behavior?  Should I simply synch up?  Or am I doing something wrong here?  The schema for the id field is:
 
<fieldType name="string_idx" class="solr.StrField" sortMissingLast="true" indexed="true" stored="true"/>
<field name="id" type="string_idx" required="true"/>
 
Karl
 
Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Michael McCandless-2
Hmmm not good!

It could be you are hitting
https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
Sep 9, after your code.  Maybe try syncing up?

Mike

On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:

> Hi Folks,
>
> I just tried to index a data set that was probably 2x as large as the
> previous one I’d been using with the same code.  The indexing completed
> fine, although it was slower than I would have liked. ;-)  But the following
> problem occurs when I try to use FieldCache to look up an indexed and stored
> value:
>
> java.lang.ArrayIndexOutOfBoundsException: -65406
>         at
> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>         at
> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>         at ...
>
> The code that does this has been working for quite some time and has been
> unmodified:
>
>     /** Find a string field value, given the lucene ID, field name, and
> value.
>     */
>     protected String getStringValue(int luceneID, String fieldName)
>       throws IOException
>     {
>       // Find the right reader
>       final int idx = readerIndex(luceneID, starts, readers.length);
>       final int docBase = starts[idx];
>       final IndexReader reader = readers[idx];
>
>       BytesRef ref =
> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
> BytesRef());
>       String rval = ref.utf8ToString();
>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
> field "+fieldName+" with result '"+rval+"'");
>       return rval;
>     }
>
>   }
>
> I added a try/catch to see what values were going into the key line:
>
> catch (RuntimeException e)
>     {
>         System.out.println("LuceneID = "+luceneID+",
> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>         System.out.println("Readers = "+readers.length);
>         int i = 0;
>         while (i < readers.length)
>             {
>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>                 i++;
>             }
>         throw e;
>     }
>
> The resulting output was:
>
> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
> Readers = 1
>      Reader start 0 is 0
>
> … which looks reasonable on the face of things.  This is a version of trunk
> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
> problem that could account for this behavior?  Should I simply synch up?  Or
> am I doing something wrong here?  The schema for the id field is:
>
> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
> indexed="true" stored="true"/>
> <field name="id" type="string_idx" required="true"/>
>
> Karl
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
Not good indeed.

Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)

Karl

-----Original Message-----
From: ext Michael McCandless [mailto:[hidden email]]
Sent: Wednesday, October 27, 2010 11:08 AM
To: [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

Hmmm not good!

It could be you are hitting
https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
Sep 9, after your code.  Maybe try syncing up?

Mike

On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:

> Hi Folks,
>
> I just tried to index a data set that was probably 2x as large as the
> previous one I'd been using with the same code.  The indexing completed
> fine, although it was slower than I would have liked. ;-)  But the following
> problem occurs when I try to use FieldCache to look up an indexed and stored
> value:
>
> java.lang.ArrayIndexOutOfBoundsException: -65406
>         at
> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>         at
> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>         at ...
>
> The code that does this has been working for quite some time and has been
> unmodified:
>
>     /** Find a string field value, given the lucene ID, field name, and
> value.
>     */
>     protected String getStringValue(int luceneID, String fieldName)
>       throws IOException
>     {
>       // Find the right reader
>       final int idx = readerIndex(luceneID, starts, readers.length);
>       final int docBase = starts[idx];
>       final IndexReader reader = readers[idx];
>
>       BytesRef ref =
> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
> BytesRef());
>       String rval = ref.utf8ToString();
>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
> field "+fieldName+" with result '"+rval+"'");
>       return rval;
>     }
>
>   }
>
> I added a try/catch to see what values were going into the key line:
>
> catch (RuntimeException e)
>     {
>         System.out.println("LuceneID = "+luceneID+",
> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>         System.out.println("Readers = "+readers.length);
>         int i = 0;
>         while (i < readers.length)
>             {
>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>                 i++;
>             }
>         throw e;
>     }
>
> The resulting output was:
>
> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
> Readers = 1
>      Reader start 0 is 0
>
> . which looks reasonable on the face of things.  This is a version of trunk
> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
> problem that could account for this behavior?  Should I simply synch up?  Or
> am I doing something wrong here?  The schema for the id field is:
>
> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
> indexed="true" stored="true"/>
> <field name="id" type="string_idx" required="true"/>
>
> Karl
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Michael McCandless-2
Fun fun :)

Is there anyway I can rsync/scp/ftp a copy of this index over....?

Failing that I can make some patches that we can iterate on...

Mike

On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:

> Not good indeed.
>
> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>
> Karl
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[hidden email]]
> Sent: Wednesday, October 27, 2010 11:08 AM
> To: [hidden email]
> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>
> Hmmm not good!
>
> It could be you are hitting
> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
> Sep 9, after your code.  Maybe try syncing up?
>
> Mike
>
> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>> Hi Folks,
>>
>> I just tried to index a data set that was probably 2x as large as the
>> previous one I'd been using with the same code.  The indexing completed
>> fine, although it was slower than I would have liked. ;-)  But the following
>> problem occurs when I try to use FieldCache to look up an indexed and stored
>> value:
>>
>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>         at
>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>         at
>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>         at ...
>>
>> The code that does this has been working for quite some time and has been
>> unmodified:
>>
>>     /** Find a string field value, given the lucene ID, field name, and
>> value.
>>     */
>>     protected String getStringValue(int luceneID, String fieldName)
>>       throws IOException
>>     {
>>       // Find the right reader
>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>       final int docBase = starts[idx];
>>       final IndexReader reader = readers[idx];
>>
>>       BytesRef ref =
>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>> BytesRef());
>>       String rval = ref.utf8ToString();
>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>> field "+fieldName+" with result '"+rval+"'");
>>       return rval;
>>     }
>>
>>   }
>>
>> I added a try/catch to see what values were going into the key line:
>>
>> catch (RuntimeException e)
>>     {
>>         System.out.println("LuceneID = "+luceneID+",
>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>         System.out.println("Readers = "+readers.length);
>>         int i = 0;
>>         while (i < readers.length)
>>             {
>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>                 i++;
>>             }
>>         throw e;
>>     }
>>
>> The resulting output was:
>>
>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>> Readers = 1
>>      Reader start 0 is 0
>>
>> . which looks reasonable on the face of things.  This is a version of trunk
>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>> problem that could account for this behavior?  Should I simply synch up?  Or
>> am I doing something wrong here?  The schema for the id field is:
>>
>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>> indexed="true" stored="true"/>
>> <field name="id" type="string_idx" required="true"/>
>>
>> Karl
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-).  But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here.

Karl
 

-----Original Message-----
From: ext Michael McCandless [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 7:50 AM
To: [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

Fun fun :)

Is there anyway I can rsync/scp/ftp a copy of this index over....?

Failing that I can make some patches that we can iterate on...

Mike

On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:

> Not good indeed.
>
> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>
> Karl
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[hidden email]]
> Sent: Wednesday, October 27, 2010 11:08 AM
> To: [hidden email]
> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>
> Hmmm not good!
>
> It could be you are hitting
> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
> Sep 9, after your code.  Maybe try syncing up?
>
> Mike
>
> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>> Hi Folks,
>>
>> I just tried to index a data set that was probably 2x as large as the
>> previous one I'd been using with the same code.  The indexing completed
>> fine, although it was slower than I would have liked. ;-)  But the following
>> problem occurs when I try to use FieldCache to look up an indexed and stored
>> value:
>>
>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>         at
>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>         at
>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>         at ...
>>
>> The code that does this has been working for quite some time and has been
>> unmodified:
>>
>>     /** Find a string field value, given the lucene ID, field name, and
>> value.
>>     */
>>     protected String getStringValue(int luceneID, String fieldName)
>>       throws IOException
>>     {
>>       // Find the right reader
>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>       final int docBase = starts[idx];
>>       final IndexReader reader = readers[idx];
>>
>>       BytesRef ref =
>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>> BytesRef());
>>       String rval = ref.utf8ToString();
>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>> field "+fieldName+" with result '"+rval+"'");
>>       return rval;
>>     }
>>
>>   }
>>
>> I added a try/catch to see what values were going into the key line:
>>
>> catch (RuntimeException e)
>>     {
>>         System.out.println("LuceneID = "+luceneID+",
>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>         System.out.println("Readers = "+readers.length);
>>         int i = 0;
>>         while (i < readers.length)
>>             {
>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>                 i++;
>>             }
>>         throw e;
>>     }
>>
>> The resulting output was:
>>
>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>> Readers = 1
>>      Reader start 0 is 0
>>
>> . which looks reasonable on the face of things.  This is a version of trunk
>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>> problem that could account for this behavior?  Should I simply synch up?  Or
>> am I doing something wrong here?  The schema for the id field is:
>>
>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>> indexed="true" stored="true"/>
>> <field name="id" type="string_idx" required="true"/>
>>
>> Karl
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
Talked with IT here - they don't recommend external transfers of this size.  So I think we'd best try the "instrument and repeat" approach instead."

Karl

-----Original Message-----
From: ext [hidden email] [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 8:16 AM
To: [hidden email]
Subject: RE: ArrayIndexOutOfBounds exception using FieldCache

It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-).  But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here.

Karl
 

-----Original Message-----
From: ext Michael McCandless [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 7:50 AM
To: [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

Fun fun :)

Is there anyway I can rsync/scp/ftp a copy of this index over....?

Failing that I can make some patches that we can iterate on...

Mike

On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:

> Not good indeed.
>
> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>
> Karl
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[hidden email]]
> Sent: Wednesday, October 27, 2010 11:08 AM
> To: [hidden email]
> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>
> Hmmm not good!
>
> It could be you are hitting
> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
> Sep 9, after your code.  Maybe try syncing up?
>
> Mike
>
> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>> Hi Folks,
>>
>> I just tried to index a data set that was probably 2x as large as the
>> previous one I'd been using with the same code.  The indexing completed
>> fine, although it was slower than I would have liked. ;-)  But the following
>> problem occurs when I try to use FieldCache to look up an indexed and stored
>> value:
>>
>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>         at
>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>         at
>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>         at ...
>>
>> The code that does this has been working for quite some time and has been
>> unmodified:
>>
>>     /** Find a string field value, given the lucene ID, field name, and
>> value.
>>     */
>>     protected String getStringValue(int luceneID, String fieldName)
>>       throws IOException
>>     {
>>       // Find the right reader
>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>       final int docBase = starts[idx];
>>       final IndexReader reader = readers[idx];
>>
>>       BytesRef ref =
>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>> BytesRef());
>>       String rval = ref.utf8ToString();
>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>> field "+fieldName+" with result '"+rval+"'");
>>       return rval;
>>     }
>>
>>   }
>>
>> I added a try/catch to see what values were going into the key line:
>>
>> catch (RuntimeException e)
>>     {
>>         System.out.println("LuceneID = "+luceneID+",
>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>         System.out.println("Readers = "+readers.length);
>>         int i = 0;
>>         while (i < readers.length)
>>             {
>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>                 i++;
>>             }
>>         throw e;
>>     }
>>
>> The resulting output was:
>>
>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>> Readers = 1
>>      Reader start 0 is 0
>>
>> . which looks reasonable on the face of things.  This is a version of trunk
>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>> problem that could account for this behavior?  Should I simply synch up?  Or
>> am I doing something wrong here?  The schema for the id field is:
>>
>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>> indexed="true" stored="true"/>
>> <field name="id" type="string_idx" required="true"/>
>>
>> Karl
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Yonik Seeley-2-2
In reply to this post by karl.wright
On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)

Hey Karl, can you try the following patch on trunk:

Index: lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
===================================================================
--- lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
(revision 1027667)
+++ lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
(working copy)
@@ -164,7 +164,7 @@

     @Override
     public BytesRef getTerm(int docID, BytesRef ret) {
-      final int pointer = (int) docToOffset.get(docID);
+      final long pointer = docToOffset.get(docID);
       return bytes.fillUsingLengthPrefix(ret, pointer);
     }
   }




-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
Yep, that fixed it. ;-)
Everything seems happy now.
Karl

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of ext Yonik Seeley
Sent: Thursday, October 28, 2010 10:17 AM
To: [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)

Hey Karl, can you try the following patch on trunk:

Index: lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
===================================================================
--- lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
(revision 1027667)
+++ lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
(working copy)
@@ -164,7 +164,7 @@

     @Override
     public BytesRef getTerm(int docID, BytesRef ret) {
-      final int pointer = (int) docToOffset.get(docID);
+      final long pointer = docToOffset.get(docID);
       return bytes.fillUsingLengthPrefix(ret, pointer);
     }
   }




-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Walter Underwood
In reply to this post by karl.wright
How big is it? The Internet works pretty well for large files.

You can send a USB drive by snail mail.

wunder

On Oct 28, 2010, at 6:11 AM, <[hidden email]> wrote:

> Talked with IT here - they don't recommend external transfers of this size.  So I think we'd best try the "instrument and repeat" approach instead."
>
> Karl
>
> -----Original Message-----
> From: ext [hidden email] [mailto:[hidden email]]
> Sent: Thursday, October 28, 2010 8:16 AM
> To: [hidden email]
> Subject: RE: ArrayIndexOutOfBounds exception using FieldCache
>
> It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-).  But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here.
>
> Karl
>
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[hidden email]]
> Sent: Thursday, October 28, 2010 7:50 AM
> To: [hidden email]
> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>
> Fun fun :)
>
> Is there anyway I can rsync/scp/ftp a copy of this index over....?
>
> Failing that I can make some patches that we can iterate on...
>
> Mike
>
> On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
>> Not good indeed.
>>
>> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>>
>> Karl
>>
>> -----Original Message-----
>> From: ext Michael McCandless [mailto:[hidden email]]
>> Sent: Wednesday, October 27, 2010 11:08 AM
>> To: [hidden email]
>> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>>
>> Hmmm not good!
>>
>> It could be you are hitting
>> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
>> Sep 9, after your code.  Maybe try syncing up?
>>
>> Mike
>>
>> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>>> Hi Folks,
>>>
>>> I just tried to index a data set that was probably 2x as large as the
>>> previous one I'd been using with the same code.  The indexing completed
>>> fine, although it was slower than I would have liked. ;-)  But the following
>>> problem occurs when I try to use FieldCache to look up an indexed and stored
>>> value:
>>>
>>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>>         at
>>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>>         at
>>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>>         at ...
>>>
>>> The code that does this has been working for quite some time and has been
>>> unmodified:
>>>
>>>     /** Find a string field value, given the lucene ID, field name, and
>>> value.
>>>     */
>>>     protected String getStringValue(int luceneID, String fieldName)
>>>       throws IOException
>>>     {
>>>       // Find the right reader
>>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>>       final int docBase = starts[idx];
>>>       final IndexReader reader = readers[idx];
>>>
>>>       BytesRef ref =
>>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>>> BytesRef());
>>>       String rval = ref.utf8ToString();
>>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>>> field "+fieldName+" with result '"+rval+"'");
>>>       return rval;
>>>     }
>>>
>>>   }
>>>
>>> I added a try/catch to see what values were going into the key line:
>>>
>>> catch (RuntimeException e)
>>>     {
>>>         System.out.println("LuceneID = "+luceneID+",
>>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>>         System.out.println("Readers = "+readers.length);
>>>         int i = 0;
>>>         while (i < readers.length)
>>>             {
>>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>>                 i++;
>>>             }
>>>         throw e;
>>>     }
>>>
>>> The resulting output was:
>>>
>>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>>> Readers = 1
>>>      Reader start 0 is 0
>>>
>>> . which looks reasonable on the face of things.  This is a version of trunk
>>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>>> problem that could account for this behavior?  Should I simply synch up?  Or
>>> am I doing something wrong here?  The schema for the id field is:
>>>
>>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>>> indexed="true" stored="true"/>
>>> <field name="id" type="string_idx" required="true"/>
>>>
>>> Karl
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Simon Willnauer
On Thu, Oct 28, 2010 at 4:59 PM, Walter Underwood <[hidden email]> wrote:
> How big is it? The Internet works pretty well for large files.

Mike, pick the USB stick up during you next run :)

simon

>
> You can send a USB drive by snail mail.
>
> wunder
>
> On Oct 28, 2010, at 6:11 AM, <[hidden email]> wrote:
>
>> Talked with IT here - they don't recommend external transfers of this size.  So I think we'd best try the "instrument and repeat" approach instead."
>>
>> Karl
>>
>> -----Original Message-----
>> From: ext [hidden email] [mailto:[hidden email]]
>> Sent: Thursday, October 28, 2010 8:16 AM
>> To: [hidden email]
>> Subject: RE: ArrayIndexOutOfBounds exception using FieldCache
>>
>> It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-).  But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here.
>>
>> Karl
>>
>>
>> -----Original Message-----
>> From: ext Michael McCandless [mailto:[hidden email]]
>> Sent: Thursday, October 28, 2010 7:50 AM
>> To: [hidden email]
>> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>>
>> Fun fun :)
>>
>> Is there anyway I can rsync/scp/ftp a copy of this index over....?
>>
>> Failing that I can make some patches that we can iterate on...
>>
>> Mike
>>
>> On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
>>> Not good indeed.
>>>
>>> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>>>
>>> Karl
>>>
>>> -----Original Message-----
>>> From: ext Michael McCandless [mailto:[hidden email]]
>>> Sent: Wednesday, October 27, 2010 11:08 AM
>>> To: [hidden email]
>>> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>>>
>>> Hmmm not good!
>>>
>>> It could be you are hitting
>>> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
>>> Sep 9, after your code.  Maybe try syncing up?
>>>
>>> Mike
>>>
>>> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>>>> Hi Folks,
>>>>
>>>> I just tried to index a data set that was probably 2x as large as the
>>>> previous one I'd been using with the same code.  The indexing completed
>>>> fine, although it was slower than I would have liked. ;-)  But the following
>>>> problem occurs when I try to use FieldCache to look up an indexed and stored
>>>> value:
>>>>
>>>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>>>         at
>>>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>>>         at
>>>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>>>         at ...
>>>>
>>>> The code that does this has been working for quite some time and has been
>>>> unmodified:
>>>>
>>>>     /** Find a string field value, given the lucene ID, field name, and
>>>> value.
>>>>     */
>>>>     protected String getStringValue(int luceneID, String fieldName)
>>>>       throws IOException
>>>>     {
>>>>       // Find the right reader
>>>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>>>       final int docBase = starts[idx];
>>>>       final IndexReader reader = readers[idx];
>>>>
>>>>       BytesRef ref =
>>>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>>>> BytesRef());
>>>>       String rval = ref.utf8ToString();
>>>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>>>> field "+fieldName+" with result '"+rval+"'");
>>>>       return rval;
>>>>     }
>>>>
>>>>   }
>>>>
>>>> I added a try/catch to see what values were going into the key line:
>>>>
>>>> catch (RuntimeException e)
>>>>     {
>>>>         System.out.println("LuceneID = "+luceneID+",
>>>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>>>         System.out.println("Readers = "+readers.length);
>>>>         int i = 0;
>>>>         while (i < readers.length)
>>>>             {
>>>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>>>                 i++;
>>>>             }
>>>>         throw e;
>>>>     }
>>>>
>>>> The resulting output was:
>>>>
>>>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>>>> Readers = 1
>>>>      Reader start 0 is 0
>>>>
>>>> . which looks reasonable on the face of things.  This is a version of trunk
>>>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>>>> problem that could account for this behavior?  Should I simply synch up?  Or
>>>> am I doing something wrong here?  The schema for the id field is:
>>>>
>>>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>>>> indexed="true" stored="true"/>
>>>> <field name="id" type="string_idx" required="true"/>
>>>>
>>>> Karl
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> --
> Walter Underwood
> Venture ASM, Troop 14, Palo Alto
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
In reply to this post by Walter Underwood
The internet is not the bottleneck ;-).  It's the intranet here.  Index is 14GB.
Besides, it looks like Yonik found the problem.
Karl


-----Original Message-----
From: ext Walter Underwood [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 11:00 AM
To: [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

How big is it? The Internet works pretty well for large files.

You can send a USB drive by snail mail.

wunder

On Oct 28, 2010, at 6:11 AM, <[hidden email]> wrote:

> Talked with IT here - they don't recommend external transfers of this size.  So I think we'd best try the "instrument and repeat" approach instead."
>
> Karl
>
> -----Original Message-----
> From: ext [hidden email] [mailto:[hidden email]]
> Sent: Thursday, October 28, 2010 8:16 AM
> To: [hidden email]
> Subject: RE: ArrayIndexOutOfBounds exception using FieldCache
>
> It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-).  But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here.
>
> Karl
>
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[hidden email]]
> Sent: Thursday, October 28, 2010 7:50 AM
> To: [hidden email]
> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>
> Fun fun :)
>
> Is there anyway I can rsync/scp/ftp a copy of this index over....?
>
> Failing that I can make some patches that we can iterate on...
>
> Mike
>
> On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
>> Not good indeed.
>>
>> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>>
>> Karl
>>
>> -----Original Message-----
>> From: ext Michael McCandless [mailto:[hidden email]]
>> Sent: Wednesday, October 27, 2010 11:08 AM
>> To: [hidden email]
>> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
>>
>> Hmmm not good!
>>
>> It could be you are hitting
>> https://issues.apache.org/jira/browse/LUCENE-2633?  That was fixed on
>> Sep 9, after your code.  Maybe try syncing up?
>>
>> Mike
>>
>> On Wed, Oct 27, 2010 at 9:21 AM,  <[hidden email]> wrote:
>>> Hi Folks,
>>>
>>> I just tried to index a data set that was probably 2x as large as the
>>> previous one I'd been using with the same code.  The indexing completed
>>> fine, although it was slower than I would have liked. ;-)  But the following
>>> problem occurs when I try to use FieldCache to look up an indexed and stored
>>> value:
>>>
>>> java.lang.ArrayIndexOutOfBoundsException: -65406
>>>         at
>>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98)
>>>         at
>>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918)
>>>         at ...
>>>
>>> The code that does this has been working for quite some time and has been
>>> unmodified:
>>>
>>>     /** Find a string field value, given the lucene ID, field name, and
>>> value.
>>>     */
>>>     protected String getStringValue(int luceneID, String fieldName)
>>>       throws IOException
>>>     {
>>>       // Find the right reader
>>>       final int idx = readerIndex(luceneID, starts, readers.length);
>>>       final int docBase = starts[idx];
>>>       final IndexReader reader = readers[idx];
>>>
>>>       BytesRef ref =
>>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new
>>> BytesRef());
>>>       String rval = ref.utf8ToString();
>>>       //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+"
>>> field "+fieldName+" with result '"+rval+"'");
>>>       return rval;
>>>     }
>>>
>>>   }
>>>
>>> I added a try/catch to see what values were going into the key line:
>>>
>>> catch (RuntimeException e)
>>>     {
>>>         System.out.println("LuceneID = "+luceneID+",
>>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase);
>>>         System.out.println("Readers = "+readers.length);
>>>         int i = 0;
>>>         while (i < readers.length)
>>>             {
>>>                 System.out.println(" Reader start "+i+" is "+starts[i]);
>>>                 i++;
>>>             }
>>>         throw e;
>>>     }
>>>
>>> The resulting output was:
>>>
>>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0
>>> Readers = 1
>>>      Reader start 0 is 0
>>>
>>> . which looks reasonable on the face of things.  This is a version of trunk
>>> from approximately 8/12/2010, so it is fairly old.  Was there a fix for a
>>> problem that could account for this behavior?  Should I simply synch up?  Or
>>> am I doing something wrong here?  The schema for the id field is:
>>>
>>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true"
>>> indexed="true" stored="true"/>
>>> <field name="id" type="string_idx" required="true"/>
>>>
>>> Karl
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Michael McCandless-2
In reply to this post by Yonik Seeley-2-2
Nice find Yonik!

Mike

On Thu, Oct 28, 2010 at 10:16 AM, Yonik Seeley
<[hidden email]> wrote:

> On Thu, Oct 28, 2010 at 6:15 AM,  <[hidden email]> wrote:
>> Synched to trunk, blew away old indexes, reindexed, same behavior.  So I think we've got a problem, Houston. ;-)
>
> Hey Karl, can you try the following patch on trunk:
>
> Index: lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
> ===================================================================
> --- lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
> (revision 1027667)
> +++ lucene/src/java/org/apache/lucene/search/cache/DocTermsCreator.java
> (working copy)
> @@ -164,7 +164,7 @@
>
>     @Override
>     public BytesRef getTerm(int docID, BytesRef ret) {
> -      final int pointer = (int) docToOffset.get(docID);
> +      final long pointer = docToOffset.get(docID);
>       return bytes.fillUsingLengthPrefix(ret, pointer);
>     }
>   }
>
>
>
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ArrayIndexOutOfBounds exception using FieldCache

Michael McCandless-2
In reply to this post by Simon Willnauer
On Thu, Oct 28, 2010 at 11:05 AM, Simon Willnauer
<[hidden email]> wrote:
> On Thu, Oct 28, 2010 at 4:59 PM, Walter Underwood <[hidden email]> wrote:
>> How big is it? The Internet works pretty well for large files.
>
> Mike, pick the USB stick up during you next run :)

Heh, next time :)

Karl, this is one big field cache entry you have, that you are
addressing > 2GB terms data.  And in 3.x this same field cache would
take much much more RAM!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: ArrayIndexOutOfBounds exception using FieldCache

karl.wright
Glad to be of service. ;-)
Karl


-----Original Message-----
From: ext Michael McCandless [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 11:48 AM
To: [hidden email]; [hidden email]
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache

On Thu, Oct 28, 2010 at 11:05 AM, Simon Willnauer
<[hidden email]> wrote:
> On Thu, Oct 28, 2010 at 4:59 PM, Walter Underwood <[hidden email]> wrote:
>> How big is it? The Internet works pretty well for large files.
>
> Mike, pick the USB stick up during you next run :)

Heh, next time :)

Karl, this is one big field cache entry you have, that you are
addressing > 2GB terms data.  And in 3.x this same field cache would
take much much more RAM!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]