TokenFilter Question

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

TokenFilter Question

Jeremy Long
I am hoping someone could point me in the right direction - while I have a
working solution I do not feel it is the best/correct solution to the
problem I was trying to solve.

My project is using Lucene to perform matching between two data sets. Where
one may have the text "Red Green" and the other would use "redgreen". What
I have done is create a Token Pair Concatenating Filter:
https://github.com/jeremylong/DependencyCheck/blob/master/core/src/main/java/org/owasp/dependencycheck/data/lucene/TokenPairConcatenatingFilter.java.
Where the query "field:(red blue green)" would end up being parsed to
"+field:red +field:redblue +field:blue +field:bluegreen +field:green".
However, my implementation ends up adding superfluous parenthesis to the
parsed query and I'm fairly certain I've missed a few key points with how
to implement a token filter that injects additional tokens into the stream.

I would be most appreciative if someone could take a look at the
implementation and suggest any improvements or point me to any
documentation that could help me better understand how a TokenFilter can
inject additional tokens into the stream.

Thanks in advance,

Jeremy