RE: question about highlight field

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: question about highlight field

Xuesong Luo
Hi, Chris,
I rewrite the prefix wildcard query consult* to (consult consult?*), it
works with highlighting. Do you think it's a possible solution?
Could you explain a little bit why put a "?" before "*" won't crash solr
if matching a lot of terms?

Thanks
Xuesong

In the trunk (soon to be Solr 1.2) Mike fixed that so the
query is "rewritten" to it's expanded form before highlighting is done
...
this works great for true wild card queries (ie: cons*t* or cons?lt*)
but
for prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix
matches
a lot of terms ... unfortunately this breaks highlighting of prefix
queries, and no one has implemented a solution yet...

https://issues.apache.org/jira/browse/SOLR-195




-Hoss


Reply | Threaded
Open this post in threaded view
|

RE: question about highlight field

Chris Hostetter-3

: I rewrite the prefix wildcard query consult* to (consult consult?*), it
: works with highlighting. Do you think it's a possible solution?
: Could you explain a little bit why put a "?" before "*" won't crash solr
: if matching a lot of terms?

actually what i said was that Solr has an optimization to help reduce the
likelyhood of crashing with prefix queries -- that optimization (using a
PrefixFilter) doesn't work with highlighting.

if you put a "?" in front of the "*" you force Solr to use a wildcard
query, bypass the optimization, and get working highlighting -- but now
you risk the same potential crashing behavior.

the possibility of a "crash" stemms from the way Wildcard (and low level
prefix) queries work -- they inspect the index to get a list of all the
words that match the pattern, and then query on each of them ... this can
be slow, but worse it can take up a lot of RAM, more ram then you may have
... so Lucene has a built in limit that controls how big it will let these
queries get, it's configurable in solr using the "maxBooleanClauses"
option in your solrconfig.xml.  If you try to make a WildcardQuery with a
pattern that matches more then that many terms in your index, the query
will fail  (but it's still possible Solr will crash if you configure
this value to be so big that it the expanded query exhausts all of your
ram)



-Hoss