Hi all,

I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid forward-porting, but I don’t think there’s an equivalent in Solr 5/6.

Specifically it’s applying shingles to the output of something like the WordDelimiterFilter - e.g. MySuperSink gets split into “My” “Super” “Sink”, and then shingled (if we’re using shingle size of 2) to be “My”, “MySuper”, “Super”, “SuperSink”, “Sink”.

I can’t just follow the WDF with a single filter because shingles aren’t created across terms coming into the WDF - it’s only for the pieces generated by the WDF.

Or is there actually a way to make this work with Solr 5/6?


— Ken

