Re: How does wildcard search works under the hood?
As you said the terms are mapped to a list of documents containing them.
To do so, they are stored in a finite state transducer. For simplification,
you can imagine it as a trie.
When you search for a term, Lucene does a lookup into the dictionary, gets
at which it can read documents in another file.
When you search for a document containing a ?, Lucene can use the fst to
identify all of the terms
matching the pattern. Like for a trie, the fst is especially potent at a
pattern like someprefix* and not as great for patterns like *somesuffix as
it will have to go through the full trie.
Implementation-wise this is done by transforming the pattern into a
deterministic finite state automaton,
and finding the terms that match the pattern in the dictionary.
> I understand the concept of an inverted index. Terms are mapped the the
> containing them. But when the query contains a wildcard (*, ?) we cannot
> simply use an index entry.
> How does wildcard searches work under the hood? Is there some other index
> helping Lucene to find all the relevant terms, and then the inverted index
> is used with all of them?
> I'm hoping to get a better understanding of this issue to be able to reason
> poor performance of the application I work on
> Thanks, Alex.
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-General-f642108.html >