Shingles from WDFF

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Shingles from WDFF

Hi all,

I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid forward-porting, but I don’t think there’s an equivalent in Solr 5/6.

Specifically it’s applying shingles to the output of something like the WordDelimiterFilter - e.g. MySuperSink gets split into “My” “Super” “Sink”, and then shingled (if we’re using shingle size of 2) to be “My”, “MySuper”, “Super”, “SuperSink”, “Sink”.

I can’t just follow the WDF with a single filter because shingles aren’t created across terms coming into the WDF - it’s only for the pieces generated by the WDF.

Or is there actually a way to make this work with Solr 5/6?


— Ken

Ken Krugler
+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr