DO NOT REPLY [Bug 35456] New: - NGramFilter -- construct n-grams from a TokenStream

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

DO NOT REPLY [Bug 35456] New: - NGramFilter -- construct n-grams from a TokenStream

Bugzilla from bugzilla@apache.org
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG?
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35456>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND?
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35456

           Summary: NGramFilter -- construct n-grams from a TokenStream
           Product: Lucene
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Analysis
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]


This filter constructs n-grams (token combinations up to a fixed size, sometimes
called "shingles") from a token stream.

The filter sets start offsets, end offsets and position increments, so
highlighting and phrase queries should work.

Position increments > 1 in the input stream are replaced by filler tokens
(tokens with termText "_" and endOffset - startOffset = 0) in the output
n-grams. (Position increments > 1 in the input stream are usually caused by
removing some tokens, eg. stopwords, from a stream.)

The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache
Commons-Collections.

Filter, test case and an analyzer are attached.

--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]