edismax parser ignores mm parameter when tokenizer splits tokens (i.e. CJK)
We are using the edismax query parser with an mm=100%. However, when a CJK
query ( ABC) gets tokenized by the CJKBigramFilter ([AB] [BC]), instead of
a Boolean AND for [AB] AND [BC], which is what we expect with mm=100%, this
gets searched as a Boolean "OR" query.
For example searching for "Daya Bay" 大亚湾 (which gets tokenized to 大亚 亚湾) we
get about 10,000 results.
If instead we manually segment the Chinese characters for Daya Bay and
enter the query ["大亚" "亚湾"] we get 5,000 results.
(Our default Boolean operator is also "AND")
This problem also occurs with non-CJK queries for example [two-thirds]
turns into a Boolean OR query for ( [two] OR [thirds] ).
Is there some way to tell the edismax query parser to stick with mm =100%?
Appended below is the debugQuery output for these two queries and an
exceprt from our schema.xml.