Quantcast

Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Thushara Wijeratna-2
I got this exception while indexing with Lucene 3.4:

Exception in thread "Thread-0" java.lang.IllegalArgumentException: Illegal
shift value, must be 0..31

at
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)

at
org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTokenStream.java:217)

at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:185)

at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)

at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)

at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)

at com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)

at java.lang.Thread.run(Thread.java:662)


It is not clear to my why the NumericTokenStream is being called here, as
my analyzer do not use that. Any clues much appreciated.


thx,

thushara
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
Do you have NumericFields? If yes, how are they configured?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Thushara Wijeratna [mailto:[hidden email]]
> Sent: Saturday, December 17, 2011 12:25 AM
> To: [hidden email]
> Subject: Lucene 3.4 : shift bug in possibly invalid use of
NumericTokenStream
>
> I got this exception while indexing with Lucene 3.4:
>
> Exception in thread "Thread-0" java.lang.IllegalArgumentException: Illegal
shift
> value, must be 0..31
>
> at
>
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
>
> at
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> enStream.java:217)
>
> at
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
d

> .java:185)
>
> at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> ldProcessorPerThread.java:278)
>
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> .java:766)
>
> at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
>
> at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
>
> at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
>
> at com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
>
> at java.lang.Thread.run(Thread.java:662)
>
>
> It is not clear to my why the NumericTokenStream is being called here, as
my
> analyzer do not use that. Any clues much appreciated.
>
>
> thx,
>
> thushara


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Thushara Wijeratna-2
Yes, there is one.

This is how the field is being created:

new NumericField("timestamp", Field.Store.NO, true);

Thus, the field is not stored, but indexed.

thx,
thushara


On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:

> Do you have NumericFields? If yes, how are they configured?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
> > -----Original Message-----
> > From: Thushara Wijeratna [mailto:[hidden email]]
> > Sent: Saturday, December 17, 2011 12:25 AM
> > To: [hidden email]
> > Subject: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream
> >
> > I got this exception while indexing with Lucene 3.4:
> >
> > Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> Illegal
> shift
> > value, must be 0..31
> >
> > at
> >
> org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
> >
> > at
> > org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> > enStream.java:217)
> >
> > at
> >
>
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
> d
> > .java:185)
> >
> > at
> > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> > ldProcessorPerThread.java:278)
> >
> > at
> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> > .java:766)
> >
> > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> >
> > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> >
> > at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> >
> > at com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> >
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> > It is not clear to my why the NumericTokenStream is being called here, as
> my
> > analyzer do not use that. Any clues much appreciated.
> >
> >
> > thx,
> >
> > thushara
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
Hi,

Thanks, this *may* cause the exception, but it is impossible that the
exception stack trace you are posting occurs in Lucene's code with a default
precision step on a numeric field, as you use here. I assume it's a 32bit
integer (NumericField.setIntValue or setFloatValue)?

Please provide us your full Java version (java -version) and ideally the
full source code you use during indexing. The only chance you can get this
Exception is by a JVM bug.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Thushara Wijeratna [mailto:[hidden email]]
> Sent: Saturday, December 17, 2011 1:01 AM
> To: [hidden email]; [hidden email]
> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream
>
> Yes, there is one.
>
> This is how the field is being created:
>
> new NumericField("timestamp", Field.Store.NO, true);
>
> Thus, the field is not stored, but indexed.
>
> thx,
> thushara
>
>
> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:
>
> > Do you have NumericFields? If yes, how are they configured?
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> >
> > > -----Original Message-----
> > > From: Thushara Wijeratna [mailto:[hidden email]]
> > > Sent: Saturday, December 17, 2011 12:25 AM
> > > To: [hidden email]
> > > Subject: Lucene 3.4 : shift bug in possibly invalid use of
> > NumericTokenStream
> > >
> > > I got this exception while indexing with Lucene 3.4:
> > >
> > > Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> > Illegal
> > shift
> > > value, must be 0..31
> > >
> > > at
> > >
> >
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)

> > >
> > > at
> > >
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> > > enStream.java:217)
> > >
> > > at
> > >
> >
> >
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel

> > d
> > > .java:185)
> > >
> > > at
> > >
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> > > ldProcessorPerThread.java:278)
> > >
> > > at
> > >
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> > > .java:766)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> > >
> > > at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> > >
> > > at
com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> > >
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > >
> > > It is not clear to my why the NumericTokenStream is being called here,
as

> > my
> > > analyzer do not use that. Any clues much appreciated.
> > >
> > >
> > > thx,
> > >
> > > thushara
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Thushara Wijeratna-2
Yes, I use this field to set a timestamp (an int). And I'm not using the special constructor, so I must be using the default precision step.
Java version : 1.6.0_24

mpire@seafcmr16:~$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Also : I have only seen this when multiple threads within the app is writing to a single Lucene index. But it is rare.

I'm attaching the indexing code.

Could you also point me to the JVM bug you suspect to be the cause?

thx,
thushara

On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <[hidden email]> wrote:
Hi,

Thanks, this *may* cause the exception, but it is impossible that the
exception stack trace you are posting occurs in Lucene's code with a default
precision step on a numeric field, as you use here. I assume it's a 32bit
integer (NumericField.setIntValue or setFloatValue)?

Please provide us your full Java version (java -version) and ideally the
full source code you use during indexing. The only chance you can get this
Exception is by a JVM bug.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Thushara Wijeratna [mailto:[hidden email]]
> Sent: Saturday, December 17, 2011 1:01 AM
> To: [hidden email]; [hidden email]
> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream
>
> Yes, there is one.
>
> This is how the field is being created:
>
> new NumericField("timestamp", Field.Store.NO, true);
>
> Thus, the field is not stored, but indexed.
>
> thx,
> thushara
>
>
> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:
>
> > Do you have NumericFields? If yes, how are they configured?
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> >
> > > -----Original Message-----
> > > From: Thushara Wijeratna [mailto:[hidden email]]
> > > Sent: Saturday, December 17, 2011 12:25 AM
> > > To: [hidden email]
> > > Subject: Lucene 3.4 : shift bug in possibly invalid use of
> > NumericTokenStream
> > >
> > > I got this exception while indexing with Lucene 3.4:
> > >
> > > Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> > Illegal
> > shift
> > > value, must be 0..31
> > >
> > > at
> > >
> >
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
> > >
> > > at
> > >
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> > > enStream.java:217)
> > >
> > > at
> > >
> >
> >
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
> > d
> > > .java:185)
> > >
> > > at
> > >
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> > > ldProcessorPerThread.java:278)
> > >
> > > at
> > >
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> > > .java:766)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> > >
> > > at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> > >
> > > at
com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> > >
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > >
> > > It is not clear to my why the NumericTokenStream is being called here,
as
> > my
> > > analyzer do not use that. Any clues much appreciated.
> > >
> > >
> > > thx,
> > >
> > > thushara
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
Hi,

 

Can you try 1.6.0_29 or disable hotspot by using "-Xint" JVM startup flag
(just to test, I know, it's slow then)? Are you *not* using
"-XX:+AggressiveOpts" as JVM parameter?

The JVM bug which may lead to this is a sign-flip bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 (see also
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-March/00494
2.html)

 

Otherwise, is all fine, if you remove the numeric field? The code you are
using can never cause such behavior, this is extensively tested.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [hidden email]

 

From: Thushara Wijeratna [mailto:[hidden email]]
Sent: Sunday, December 18, 2011 11:17 PM
To: [hidden email]; [hidden email]
Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
NumericTokenStream

 

Yes, I use this field to set a timestamp (an int). And I'm not using the
special constructor, so I must be using the default precision step.

Java version : 1.6.0_24

 

mpire@seafcmr16:~$ java -version

java version "1.6.0_24"

Java(TM) SE Runtime Environment (build 1.6.0_24-b07)

Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

 

Also : I have only seen this when multiple threads within the app is writing
to a single Lucene index. But it is rare.

 

I'm attaching the indexing code.

 

Could you also point me to the JVM bug you suspect to be the cause?

 

thx,

thushara

 

On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <[hidden email]> wrote:

Hi,

Thanks, this *may* cause the exception, but it is impossible that the
exception stack trace you are posting occurs in Lucene's code with a default
precision step on a numeric field, as you use here. I assume it's a 32bit
integer (NumericField.setIntValue or setFloatValue)?

Please provide us your full Java version (java -version) and ideally the
full source code you use during indexing. The only chance you can get this
Exception is by a JVM bug.


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Thushara Wijeratna [mailto:[hidden email]]

> Sent: Saturday, December 17, 2011 1:01 AM
> To: [hidden email]; [hidden email]
> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream
>
> Yes, there is one.
>
> This is how the field is being created:
>
> new NumericField("timestamp", Field.Store.NO, true);
>
> Thus, the field is not stored, but indexed.
>
> thx,
> thushara
>
>
> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:
>
> > Do you have NumericFields? If yes, how are they configured?
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> >
> > > -----Original Message-----
> > > From: Thushara Wijeratna [mailto:[hidden email]]
> > > Sent: Saturday, December 17, 2011 12:25 AM
> > > To: [hidden email]
> > > Subject: Lucene 3.4 : shift bug in possibly invalid use of
> > NumericTokenStream
> > >
> > > I got this exception while indexing with Lucene 3.4:
> > >
> > > Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> > Illegal
> > shift
> > > value, must be 0..31
> > >
> > > at
> > >
> >
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)

> > >
> > > at
> > >
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> > > enStream.java:217)
> > >
> > > at
> > >
> >
> >
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel

> > d
> > > .java:185)
> > >
> > > at
> > >
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> > > ldProcessorPerThread.java:278)
> > >
> > > at
> > >
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> > > .java:766)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> > >
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> > >
> > > at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> > >
> > > at
com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> > >
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > >
> > > It is not clear to my why the NumericTokenStream is being called here,
as

> > my
> > > analyzer do not use that. Any clues much appreciated.
> > >
> > >
> > > thx,
> > >
> > > thushara
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Thushara Wijeratna-2
This is difficult to repro. I'm not using any JVM flags. It does seem
that the following code could never call NumericUtils.intToPrefixCoded
with a shift > 31 (or shift < 0) so I tend to agree this must be a JVM
bug. Looking through all logs I have for December, I only found one
instance of this issue. It seems it has nothing to do with
concurrency, then it must have to do with the value set in the
NumericField, so the bug must be triggered by a particular timestamp.


from: http://javasourcecode.org/html/open-source/lucene/lucene-3.3.0/org/apache/lucene/analysis/NumericTokenStream.java.html


  public boolean incrementToken() {
    if (valSize == 0)
      throw new IllegalStateException
<http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/IllegalStateException.java.html>("call
set???Value() before usage");
    if (shift >= valSize)
      return false;

    clearAttributes();
    final char[] buffer;
    switch (valSize) {
      case 64:
        buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_LONG);
        termAtt.setLength(NumericUtils.longToPrefixCoded(value, shift, buffer));
        break;

      case 32:
        buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_INT);
        termAtt.setLength(NumericUtils.intToPrefixCoded((int) value,
shift, buffer));
        break;

      default:
        // should not happen        throw new IllegalArgumentException
<http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/IllegalArgumentException.java.html>("valSize
must be 32 or 64");
    }

    typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC :
TOKEN_TYPE_LOWER_PREC);
    posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
    shift += precisionStep;
    return true;
  }


On Sun, Dec 18, 2011 at 2:50 PM, Uwe Schindler <[hidden email]> wrote:

> Hi,****
>
> ** **
>
> Can you try 1.6.0_29 or disable hotspot by using “-Xint” JVM startup flag
> (just to test, I know, it’s slow then)? Are you **not** using
> “-XX:+AggressiveOpts” as JVM parameter?****
>
> The JVM bug which may lead to this is a sign-flip bug:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 (see also
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-March/004942.html
> )****
>
> ** **
>
> Otherwise, is all fine, if you remove the numeric field? The code you are
> using can never cause such behavior, this is extensively tested.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> H.-H.-Meier-Allee 63, D-28213 Bremen****
>
> http://www.thetaphi.de****
>
> eMail: [hidden email]****
>
> ** **
>
> *From:* Thushara Wijeratna [mailto:[hidden email]]
> *Sent:* Sunday, December 18, 2011 11:17 PM
>
> *To:* [hidden email]; [hidden email]
> *Subject:* Re: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream****
>
> ** **
>
> Yes, I use this field to set a timestamp (an int). And I'm not using the
> special constructor, so I must be using the default precision step.****
>
> Java version : 1.6.0_24****
>
> ** **
>
> mpire@seafcmr16:~$ java -version****
>
> java version "1.6.0_24"****
>
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)****
>
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)****
>
> ** **
>
> Also : I have only seen this when multiple threads within the app is
> writing to a single Lucene index. But it is rare.****
>
> ** **
>
> I'm attaching the indexing code.****
>
> ** **
>
> Could you also point me to the JVM bug you suspect to be the cause?****
>
> ** **
>
> thx,****
>
> thushara****
>
> ** **
>
> On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <[hidden email]> wrote:***
> *
>
> Hi,
>
> Thanks, this *may* cause the exception, but it is impossible that the
> exception stack trace you are posting occurs in Lucene's code with a
> default
> precision step on a numeric field, as you use here. I assume it's a 32bit
> integer (NumericField.setIntValue or setFloatValue)?
>
> Please provide us your full Java version (java -version) and ideally the
> full source code you use during indexing. The only chance you can get this
> Exception is by a JVM bug.****
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
> > -----Original Message-----
> > From: Thushara Wijeratna [mailto:[hidden email]]****
>
> > Sent: Saturday, December 17, 2011 1:01 AM
> > To: [hidden email]; [hidden email]
> > Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> > NumericTokenStream
> >
> > Yes, there is one.
> >
> > This is how the field is being created:
> >
> > new NumericField("timestamp", Field.Store.NO, true);
> >
> > Thus, the field is not stored, but indexed.
> >
> > thx,
> > thushara
> >
> >
> > On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:
> >
> > > Do you have NumericFields? If yes, how are they configured?
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: [hidden email]
> > >
> > >
> > > > -----Original Message-----
> > > > From: Thushara Wijeratna [mailto:[hidden email]]
> > > > Sent: Saturday, December 17, 2011 12:25 AM
> > > > To: [hidden email]
> > > > Subject: Lucene 3.4 : shift bug in possibly invalid use of
> > > NumericTokenStream
> > > >
> > > > I got this exception while indexing with Lucene 3.4:
> > > >
> > > > Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> > > Illegal
> > > shift
> > > > value, must be 0..31
> > > >
> > > > at
> > > >
> > >
> org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
> > > >
> > > > at
> > > >
> > org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> > > > enStream.java:217)
> > > >
> > > > at
> > > >
> > >
> > >
> >
>
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
> > > d
> > > > .java:185)
> > > >
> > > > at
> > > >
> > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> > > > ldProcessorPerThread.java:278)
> > > >
> > > > at
> > > >
> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> > > > .java:766)
> > > >
> > > > at
> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> > > >
> > > > at
> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> > > >
> > > > at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> > > >
> > > > at
> com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> > > >
> > > > at java.lang.Thread.run(Thread.java:662)
> > > >
> > > >
> > > > It is not clear to my why the NumericTokenStream is being called
> here,
> as
> > > my
> > > > analyzer do not use that. Any clues much appreciated.
> > > >
> > > >
> > > > thx,
> > > >
> > > > thushara
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]****
>
> ** **
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Peter
Hi,

I was hitting a similar exception (for me it was of type 'long'). But I
thought it was because I had a programming mistake. termAtt is reused.
Couldn't it be that when two threads accessing the incrementToken method
at the same time that problems occur?

This exception disappeared when I fixed some threading issues in my app
... (it was even reproducable so I can post something if someone is
interested)

Regards,
Peter.

> This is difficult to repro. I'm not using any JVM flags. It does seem
> that the following code could never call NumericUtils.intToPrefixCoded
> with a shift > 31 (or shift < 0) so I tend to agree this must be a JVM
> bug. Looking through all logs I have for December, I only found one
> instance of this issue. It seems it has nothing to do with
> concurrency, then it must have to do with the value set in the
> NumericField, so the bug must be triggered by a particular timestamp.
>
>
> from: http://javasourcecode.org/html/open-source/lucene/lucene-3.3.0/org/apache/lucene/analysis/NumericTokenStream.java.html
>
>
>   public boolean incrementToken() {
>     if (valSize == 0)
>       throw new IllegalStateException
> <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/IllegalStateException.java.html>("call
> set???Value() before usage");
>     if (shift >= valSize)
>       return false;
>
>     clearAttributes();
>     final char[] buffer;
>     switch (valSize) {
>       case 64:
>         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_LONG);
>         termAtt.setLength(NumericUtils.longToPrefixCoded(value, shift, buffer));
>         break;
>
>       case 32:
>         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_INT);
>         termAtt.setLength(NumericUtils.intToPrefixCoded((int) value,
> shift, buffer));
>         break;
>
>       default:
>         // should not happen        throw new IllegalArgumentException
> <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/IllegalArgumentException.java.html>("valSize
> must be 32 or 64");
>     }
>
>     typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC :
> TOKEN_TYPE_LOWER_PREC);
>     posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
>     shift += precisionStep;
>     return true;
>   }
>
>
> On Sun, Dec 18, 2011 at 2:50 PM, Uwe Schindler <[hidden email]> wrote:
>
>> Hi,****
>>
>> ** **
>>
>> Can you try 1.6.0_29 or disable hotspot by using “-Xint” JVM startup flag
>> (just to test, I know, it’s slow then)? Are you **not** using
>> “-XX:+AggressiveOpts” as JVM parameter?****
>>
>> The JVM bug which may lead to this is a sign-flip bug:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 (see also
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-March/004942.html
>> )****
>>
>> ** **
>>
>> Otherwise, is all fine, if you remove the numeric field? The code you are
>> using can never cause such behavior, this is extensively tested.****
>>
>> ** **
>>
>> Uwe****
>>
>> ** **
>>
>> -----****
>>
>> Uwe Schindler****
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen****
>>
>> http://www.thetaphi.de****
>>
>> eMail: [hidden email]****
>>
>> ** **
>>
>> *From:* Thushara Wijeratna [mailto:[hidden email]]
>> *Sent:* Sunday, December 18, 2011 11:17 PM
>>
>> *To:* [hidden email]; [hidden email]
>> *Subject:* Re: Lucene 3.4 : shift bug in possibly invalid use of
>> NumericTokenStream****
>>
>> ** **
>>
>> Yes, I use this field to set a timestamp (an int). And I'm not using the
>> special constructor, so I must be using the default precision step.****
>>
>> Java version : 1.6.0_24****
>>
>> ** **
>>
>> mpire@seafcmr16:~$ java -version****
>>
>> java version "1.6.0_24"****
>>
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)****
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)****
>>
>> ** **
>>
>> Also : I have only seen this when multiple threads within the app is
>> writing to a single Lucene index. But it is rare.****
>>
>> ** **
>>
>> I'm attaching the indexing code.****
>>
>> ** **
>>
>> Could you also point me to the JVM bug you suspect to be the cause?****
>>
>> ** **
>>
>> thx,****
>>
>> thushara****
>>
>> ** **
>>
>> On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <[hidden email]> wrote:***
>> *
>>
>> Hi,
>>
>> Thanks, this *may* cause the exception, but it is impossible that the
>> exception stack trace you are posting occurs in Lucene's code with a
>> default
>> precision step on a numeric field, as you use here. I assume it's a 32bit
>> integer (NumericField.setIntValue or setFloatValue)?
>>
>> Please provide us your full Java version (java -version) and ideally the
>> full source code you use during indexing. The only chance you can get this
>> Exception is by a JVM bug.****
>>
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [hidden email]
>>
>>
>>> -----Original Message-----
>>> From: Thushara Wijeratna [mailto:[hidden email]]****
>>> Sent: Saturday, December 17, 2011 1:01 AM
>>> To: [hidden email]; [hidden email]
>>> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
>>> NumericTokenStream
>>>
>>> Yes, there is one.
>>>
>>> This is how the field is being created:
>>>
>>> new NumericField("timestamp", Field.Store.NO, true);
>>>
>>> Thus, the field is not stored, but indexed.
>>>
>>> thx,
>>> thushara
>>>
>>>
>>> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]> wrote:
>>>
>>>> Do you have NumericFields? If yes, how are they configured?
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: [hidden email]
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Thushara Wijeratna [mailto:[hidden email]]
>>>>> Sent: Saturday, December 17, 2011 12:25 AM
>>>>> To: [hidden email]
>>>>> Subject: Lucene 3.4 : shift bug in possibly invalid use of
>>>> NumericTokenStream
>>>>> I got this exception while indexing with Lucene 3.4:
>>>>>
>>>>> Exception in thread "Thread-0" java.lang.IllegalArgumentException:
>>>> Illegal
>>>> shift
>>>>> value, must be 0..31
>>>>>
>>>>> at
>>>>>
>> org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
>>>>> at
>>>>>
>>> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
>>>>> enStream.java:217)
>>>>>
>>>>> at
>>>>>
>>>>
>> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
>>>> d
>>>>> .java:185)
>>>>>
>>>>> at
>>>>>
>>> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
>>>>> ldProcessorPerThread.java:278)
>>>>>
>>>>> at
>>>>>
>>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
>>>>> .java:766)
>>>>>
>>>>> at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
>>>>> at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
>>>>> at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
>>>>>
>>>>> at
>> com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>> It is not clear to my why the NumericTokenStream is being called
>> here,
>> as
>>>> my
>>>>> analyzer do not use that. Any clues much appreciated.
>>>>>
>>>>>
>>>>> thx,
>>>>>
>>>>> thushara
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]****
>>
>> ** **
>>


--
http://jetsli.de news reader for geeks


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
Hi,
 
> I was hitting a similar exception (for me it was of type 'long'). But I
thought it
> was because I had a programming mistake. termAtt is reused.
> Couldn't it be that when two threads accessing the incrementToken method
at
> the same time that problems occur?

If it is not a problem in the user code invoking the IndexWriter, it cannot
happen, as IndexWriter only accesses one document per theread and can only
call incrementToken from one thread. But if, e.g. the user's indexing code
reuses Documents and Fields (as suggested for performance reasons), it may
happen that the *same* NumericField instance (or other document/field type)
is added to IndexWriter from different threads. In this case, it could
easily happen that shift gets out of bounds. But if this is the case, you
index is also crap, as all numeric values (and other fields) would be wrong.

> This exception disappeared when I fixed some threading issues in my app
... (it
> was even reproducable so I can post something if someone is
> interested)

I assume it was a bug like noted before?

> > This is difficult to repro. I'm not using any JVM flags. It does seem
> > that the following code could never call NumericUtils.intToPrefixCoded
> > with a shift > 31 (or shift < 0) so I tend to agree this must be a JVM
> > bug. Looking through all logs I have for December, I only found one
> > instance of this issue. It seems it has nothing to do with
> > concurrency, then it must have to do with the value set in the
> > NumericField, so the bug must be triggered by a particular timestamp.

The timestamp cannot trigger it, the check is only done on the
precisionStep/shift/valSize fields, so the actual value is unaffected. If it
is not a concurrency bug by reusing documents/fields from different threads,
there can only be a sign flip in the JVM.

Uwe

...

> > from:
> > http://javasourcecode.org/html/open-source/lucene/lucene-3.3.0/org/apa
> > che/lucene/analysis/NumericTokenStream.java.html
> >
> >
> >   public boolean incrementToken() {
> >     if (valSize == 0)
> >       throw new IllegalStateException
> > <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/Ill
> > egalStateException.java.html>("call
> > set???Value() before usage");
> >     if (shift >= valSize)
> >       return false;
> >
> >     clearAttributes();
> >     final char[] buffer;
> >     switch (valSize) {
> >       case 64:
> >         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_LONG);
> >         termAtt.setLength(NumericUtils.longToPrefixCoded(value, shift,
buffer));

> >         break;
> >
> >       case 32:
> >         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_INT);
> >         termAtt.setLength(NumericUtils.intToPrefixCoded((int) value,
> > shift, buffer));
> >         break;
> >
> >       default:
> >         // should not happen        throw new IllegalArgumentException
> > <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/Ill
> > egalArgumentException.java.html>("valSize
> > must be 32 or 64");
> >     }
> >
> >     typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC :
> > TOKEN_TYPE_LOWER_PREC);
> >     posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
> >     shift += precisionStep;
> >     return true;
> >   }
> >
> >
> > On Sun, Dec 18, 2011 at 2:50 PM, Uwe Schindler <[hidden email]> wrote:
> >
> >> Hi,****
> >>
> >> ** **
> >>
> >> Can you try 1.6.0_29 or disable hotspot by using "-Xint" JVM startup
> >> flag (just to test, I know, it's slow then)? Are you **not** using
> >> "-XX:+AggressiveOpts" as JVM parameter?****
> >>
> >> The JVM bug which may lead to this is a sign-flip bug:
> >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 (see also
> >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-Marc
> >> h/004942.html
> >> )****
> >>
> >> ** **
> >>
> >> Otherwise, is all fine, if you remove the numeric field? The code you
> >> are using can never cause such behavior, this is extensively
> >> tested.****
> >>
> >> ** **
> >>
> >> Uwe****
> >>
> >> ** **
> >>
> >> -----****
> >>
> >> Uwe Schindler****
> >>
> >> H.-H.-Meier-Allee 63, D-28213 Bremen****
> >>
> >> http://www.thetaphi.de****
> >>
> >> eMail: [hidden email]****
> >>
> >> ** **
> >>
> >> *From:* Thushara Wijeratna [mailto:[hidden email]]
> >> *Sent:* Sunday, December 18, 2011 11:17 PM
> >>
> >> *To:* [hidden email]; [hidden email]
> >> *Subject:* Re: Lucene 3.4 : shift bug in possibly invalid use of
> >> NumericTokenStream****
> >>
> >> ** **
> >>
> >> Yes, I use this field to set a timestamp (an int). And I'm not using
> >> the special constructor, so I must be using the default precision
> >> step.****
> >>
> >> Java version : 1.6.0_24****
> >>
> >> ** **
> >>
> >> mpire@seafcmr16:~$ java -version****
> >>
> >> java version "1.6.0_24"****
> >>
> >> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)****
> >>
> >> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)****
> >>
> >> ** **
> >>
> >> Also : I have only seen this when multiple threads within the app is
> >> writing to a single Lucene index. But it is rare.****
> >>
> >> ** **
> >>
> >> I'm attaching the indexing code.****
> >>
> >> ** **
> >>
> >> Could you also point me to the JVM bug you suspect to be the
> >> cause?****
> >>
> >> ** **
> >>
> >> thx,****
> >>
> >> thushara****
> >>
> >> ** **
> >>
> >> On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <[hidden email]>
> >> wrote:***
> >> *
> >>
> >> Hi,
> >>
> >> Thanks, this *may* cause the exception, but it is impossible that the
> >> exception stack trace you are posting occurs in Lucene's code with a
> >> default precision step on a numeric field, as you use here. I assume
> >> it's a 32bit integer (NumericField.setIntValue or setFloatValue)?
> >>
> >> Please provide us your full Java version (java -version) and ideally
> >> the full source code you use during indexing. The only chance you can
> >> get this Exception is by a JVM bug.****
> >>
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: [hidden email]
> >>
> >>
> >>> -----Original Message-----
> >>> From: Thushara Wijeratna [mailto:[hidden email]]****
> >>> Sent: Saturday, December 17, 2011 1:01 AM
> >>> To: [hidden email]; [hidden email]
> >>> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> >>> NumericTokenStream
> >>>
> >>> Yes, there is one.
> >>>
> >>> This is how the field is being created:
> >>>
> >>> new NumericField("timestamp", Field.Store.NO, true);
> >>>
> >>> Thus, the field is not stored, but indexed.
> >>>
> >>> thx,
> >>> thushara
> >>>
> >>>
> >>> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <[hidden email]>
wrote:

> >>>
> >>>> Do you have NumericFields? If yes, how are they configured?
> >>>>
> >>>> -----
> >>>> Uwe Schindler
> >>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>>> eMail: [hidden email]
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Thushara Wijeratna [mailto:[hidden email]]
> >>>>> Sent: Saturday, December 17, 2011 12:25 AM
> >>>>> To: [hidden email]
> >>>>> Subject: Lucene 3.4 : shift bug in possibly invalid use of
> >>>> NumericTokenStream
> >>>>> I got this exception while indexing with Lucene 3.4:
> >>>>>
> >>>>> Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> >>>> Illegal
> >>>> shift
> >>>>> value, must be 0..31
> >>>>>
> >>>>> at
> >>>>>
> >>
>
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)

> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> >>>>> enStream.java:217)
> >>>>>
> >>>>> at
> >>>>>
> >>>>
> >>
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel

> >>>> d
> >>>>> .java:185)
> >>>>>
> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> >>>>> ldProcessorPerThread.java:278)
> >>>>>
> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> >>>>> .java:766)
> >>>>>
> >>>>> at
> >>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> >>>>> at
> >>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> >>>>> at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> >>>>>
> >>>>> at
> >> com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> >>>>> at java.lang.Thread.run(Thread.java:662)
> >>>>>
> >>>>>
> >>>>> It is not clear to my why the NumericTokenStream is being called
> >> here,
> >> as
> >>>> my
> >>>>> analyzer do not use that. Any clues much appreciated.
> >>>>>
> >>>>>
> >>>>> thx,
> >>>>>
> >>>>> thushara
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]****
> >>
> >> ** **
> >>
>
>
> --
> http://jetsli.de news reader for geeks
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Peter
> I assume it was a bug like noted before?

Exactly. Nothing to do with Lucene IMHO

Peter.

--
http://jetsli.de news reader for geeks


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Peter
In reply to this post by Uwe Schindler
BTW: how can I use NumericUtils.longToPrefixCoded in 4.0 ?

Peter.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
Hi,

NumericUtils is an internal implementation class, you should not use it.
What do you want to do? There is no need to call any of its methods during
indexing or searching. Everything else is advanced. I the latter case you
should RTFM of BytesRef and realted classes (possibly watch the flexible
indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
moved to binary terms in 4.0 and no longer uses character based terms, so
the code is different. BytesRef is just a wrapper around a byte[].

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Peter Karich [mailto:[hidden email]]
> Sent: Monday, December 19, 2011 1:40 PM
> To: [hidden email]
> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> NumericTokenStream
>
> BTW: how can I use NumericUtils.longToPrefixCoded in 4.0 ?
>
> Peter.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

Peter
Hi Uwe,

thanks for the talk suggestion(s)*.

I was using it for faster term lookups of a long 'id'. How would this be
done with 4.0? Before I did it via Term:

new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));

How should I generally do "term lookup" in 4.0 as you said in the video
that 'Term' gets removed somewhen :)? What is the most recommended way
and what is the fastest? Or where can I find "most recent" code in
lucene tests to be used as an example?

I also heard the suggestion to use the pulsing codec for id retrieval**.
Is this the correct way nowadays to achive this:

indexWriterCfg.setCodec(new Lucene40Codec() {
   @Override public PostingsFormat getPostingsFormatForField(String field) {
       if("_id".equals(field)) return new Pulsing40PostingsFormat();
       else ?
   }});

Regards,
Peter.

*
http://vimeo.com/32065505

**
http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html


> Hi,
>
> NumericUtils is an internal implementation class, you should not use it.
> What do you want to do? There is no need to call any of its methods during
> indexing or searching. Everything else is advanced. I the latter case you
> should RTFM of BytesRef and realted classes (possibly watch the flexible
> indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
> moved to binary terms in 4.0 and no longer uses character based terms, so
> the code is different. BytesRef is just a wrapper around a byte[].
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

Uwe Schindler
> Hi Uwe,
>
> thanks for the talk suggestion(s)*.
>
> I was using it for faster term lookups of a long 'id'. How would this be
done with
> 4.0? Before I did it via Term:
>
> new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));

If you want to query on a single numeric term value, use
NumericRangeQuery.newLongRange(field, ..., value, value, true, true), this
rewrites to a simple TermQuery.

Otherwise you have to create a BytesRef() object:

final BytesRef bytes = new BytesRef(); // for reuse!
NumericUtils.longToPrefixCoded(longValue, 0, bytes); // 0 is shift value
new Term(fieldName, bytes);

> How should I generally do "term lookup" in 4.0 as you said in the video
that
> 'Term' gets removed somewhen :)? What is the most recommended way and
> what is the fastest? Or where can I find "most recent" code in lucene
tests to be
> used as an example?

Term lookup can be done by field and BytesRef: get a TermsEnum for the field
and seek to the BytesRef. For strings you can create a UTF-8 encoded
Bytesref using new BytesRef(CharSequence). If you need docFreq, ask
IndexReader with field name and BytesRef. And so on, it's always the same
:-)

> > NumericUtils is an internal implementation class, you should not use it.
> > What do you want to do? There is no need to call any of its methods
> > during indexing or searching. Everything else is advanced. I the
> > latter case you should RTFM of BytesRef and realted classes (possibly
> > watch the flexible indexing talk done by me in Berlin, Barcelona or
> > San Francisco). Lucene moved to binary terms in 4.0 and no longer uses
> > character based terms, so the code is different. BytesRef is just a
wrapper

> around a byte[].
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

inspecting chinese index using luke

Peyman Faratin
hi

We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding).

OutputStreamWriter outputFileWriter  = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8");

We are trying to inspect the index in Luke 3.4.0 (have chosen the UTF-8 option in Luke), but it seems to be garbled. Any advice would be appreciated

thank you

Peyman
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: inspecting chinese index using luke

Uwe Schindler
Hi,

Please look at:
http://people.apache.org/~hossman/#threadhijack

Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Peyman Faratin [mailto:[hidden email]]
> Sent: Monday, December 19, 2011 6:11 PM
> To: [hidden email]
> Subject: inspecting chinese index using luke
>
> hi
>
> We are indexing some chinese text (using the following outputstreamwriter
> with UTF-8 enconding).
>
> OutputStreamWriter outputFileWriter  = new OutputStreamWriter(new
> FileOutputStream(outputFile), "utf8");
>
> We are trying to inspect the index in Luke 3.4.0 (have chosen the UTF-8
option
> in Luke), but it seems to be garbled. Any advice would be appreciated
>
> thank you
>
> Peyman


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

Simon Willnauer
In reply to this post by Peter
On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich <[hidden email]> wrote:

> Hi Uwe,
>
> thanks for the talk suggestion(s)*.
>
> I was using it for faster term lookups of a long 'id'. How would this be
> done with 4.0? Before I did it via Term:
>
> new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));
>
> How should I generally do "term lookup" in 4.0 as you said in the video
> that 'Term' gets removed somewhen :)? What is the most recommended way
> and what is the fastest? Or where can I find "most recent" code in
> lucene tests to be used as an example?
>
> I also heard the suggestion to use the pulsing codec for id retrieval**.
> Is this the correct way nowadays to achive this:
>
> indexWriterCfg.setCodec(new Lucene40Codec() {
>   @Override public PostingsFormat getPostingsFormatForField(String field) {
>       if("_id".equals(field)) return new Pulsing40PostingsFormat();
>       else ?
>   }});

do something like this:

  public static final class CustomPerFieldCodec extends Lucene40Codec {
    private final PostingsFormat pulsing = PostingsFormat.forName("Pulsing40");
    private final PostingsFormat defaultFormat =
PostingsFormat.forName("Lucene40");

    @Override
    public PostingsFormat getPostingsFormatForField(String field) {
      if (field.equals("id")) {
        return pulsing;
      } else {
        return defaultFormat;
      }
    }
  }

simon

>
> Regards,
> Peter.
>
> *
> http://vimeo.com/32065505
>
> **
> http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html
>
>
>> Hi,
>>
>> NumericUtils is an internal implementation class, you should not use it.
>> What do you want to do? There is no need to call any of its methods during
>> indexing or searching. Everything else is advanced. I the latter case you
>> should RTFM of BytesRef and realted classes (possibly watch the flexible
>> indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
>> moved to binary terms in 4.0 and no longer uses character based terms, so
>> the code is different. BytesRef is just a wrapper around a byte[].
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [hidden email]
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

Simon Willnauer
On Mon, Dec 19, 2011 at 9:04 PM, Simon Willnauer
<[hidden email]> wrote:

> On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich <[hidden email]> wrote:
>> Hi Uwe,
>>
>> thanks for the talk suggestion(s)*.
>>
>> I was using it for faster term lookups of a long 'id'. How would this be
>> done with 4.0? Before I did it via Term:
>>
>> new Term(fieldName, NumericUtils.longToPrefixCoded(longValue));
>>
>> How should I generally do "term lookup" in 4.0 as you said in the video
>> that 'Term' gets removed somewhen :)? What is the most recommended way
>> and what is the fastest? Or where can I find "most recent" code in
>> lucene tests to be used as an example?
>>
>> I also heard the suggestion to use the pulsing codec for id retrieval**.
>> Is this the correct way nowadays to achive this:
>>
>> indexWriterCfg.setCodec(new Lucene40Codec() {
>>   @Override public PostingsFormat getPostingsFormatForField(String field) {
>>       if("_id".equals(field)) return new Pulsing40PostingsFormat();
>>       else ?
>>   }});
>
> do something like this:
>
>  public static final class CustomPerFieldCodec extends Lucene40Codec {
>    private final PostingsFormat pulsing = PostingsFormat.forName("Pulsing40");
>    private final PostingsFormat defaultFormat =
> PostingsFormat.forName("Lucene40");
>
>    @Override
>    public PostingsFormat getPostingsFormatForField(String field) {
>      if (field.equals("id")) {
>        return pulsing;
>      } else {
>        return defaultFormat;
>      }
>    }
>  }
>
> simon

Actually, if you look for fast ID lookups you could consider using
Memory PostingsFormat. This keeps everything in memory and should be
the fastest alternative but costly in terms of RAM.

private final PostingsFormat memory = PostingsFormat.forName("Memory");

simon

>>
>> Regards,
>> Peter.
>>
>> *
>> http://vimeo.com/32065505
>>
>> **
>> http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html
>>
>>
>>> Hi,
>>>
>>> NumericUtils is an internal implementation class, you should not use it.
>>> What do you want to do? There is no need to call any of its methods during
>>> indexing or searching. Everything else is advanced. I the latter case you
>>> should RTFM of BytesRef and realted classes (possibly watch the flexible
>>> indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
>>> moved to binary terms in 4.0 and no longer uses character based terms, so
>>> the code is different. BytesRef is just a wrapper around a byte[].
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: [hidden email]
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

Thushara Wijeratna-2
In reply to this post by Uwe Schindler
Actually, the a single timestamp field is being used by several threads.
Sorry, I missed that, and thanks Peter, Uwe both for the explanations. [In
my code snippet, I was trying to simplify so missed this. I'm constructing
one timestamp field and passing it to all threads in the ctor.]

On Mon, Dec 19, 2011 at 5:07 AM, Uwe Schindler <[hidden email]> wrote:

> Hi,
>
> NumericUtils is an internal implementation class, you should not use it.
> What do you want to do? There is no need to call any of its methods during
> indexing or searching. Everything else is advanced. I the latter case you
> should RTFM of BytesRef and realted classes (possibly watch the flexible
> indexing talk done by me in Berlin, Barcelona or San Francisco). Lucene
> moved to binary terms in 4.0 and no longer uses character based terms, so
> the code is different. BytesRef is just a wrapper around a byte[].
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
> > -----Original Message-----
> > From: Peter Karich [mailto:[hidden email]]
> > Sent: Monday, December 19, 2011 1:40 PM
> > To: [hidden email]
> > Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> > NumericTokenStream
> >
> > BTW: how can I use NumericUtils.longToPrefixCoded in 4.0 ?
> >
> > Peter.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Loading...