[jira] [Comment Edited] (LUCENE-4656) Fix EmptyTokenizer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (LUCENE-4656) Fix EmptyTokenizer

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543025#comment-13543025 ]

Uwe Schindler edited comment on LUCENE-4656 at 1/3/13 4:20 PM:
---------------------------------------------------------------

Here a patch showing the bug in the public class EmptyTokenStream from analysis-common working together with IndexWriter:

{noformat}
[junit4:junit4] ERROR   0.33s | TestEmptyTokenStream.testIndexWriter_LUCENE4656 <<<
[junit4:junit4]    > Throwable #1: java.lang.IllegalArgumentException: This AttributeSource does not have the attribute 'org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute'.
[junit4:junit4]    >    at __randomizedtesting.SeedInfo.seed([3B209861053849AF:D7B239E3D4067832]:0)
[junit4:junit4]    >    at org.apache.lucene.util.AttributeSource.getAttribute(AttributeSource.java:303)
[junit4:junit4]    >    at org.apache.lucene.index.TermsHashPerField.start(TermsHashPerField.java:119)
[junit4:junit4]    >    at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:109)
[junit4:junit4]    >    at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:272)
[junit4:junit4]    >    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
[junit4:junit4]    >    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
[junit4:junit4]    >    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1456)
[junit4:junit4]    >    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1131)
[junit4:junit4]    >    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1112)
[junit4:junit4]    >    at org.apache.lucene.analysis.miscellaneous.TestEmptyTokenStream.testIndexWriter_LUCENE4656(TestEmptyTokenSt{noformat}

It also has a test that assertTokenStreamContents actually works, which it doesnt at the moment, because it asserts that the CTA is available. But NumericTokenStream *and* this one both dont have this attribute.
               
      was (Author: thetaphi):
    Here a patch showing the bug in the public class EmptyTokenStream from analysis-common working together with IndexWriter.

It also has a test that assertTokenStreamContents actually works.
                 

> Fix EmptyTokenizer
> ------------------
>
>                 Key: LUCENE-4656
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4656
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Trivial
>         Attachments: LUCENE-4656-IW-bug.patch, LUCENE-4656.patch, LUCENE-4656.patch
>
>
> TestRandomChains can fail because EmptyTokenizer doesn't have a CharTermAttribute and doesn't compute the end offset (if the offset attribute was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]