Quantcast

How FST constructed in lucene?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How FST constructed in lucene?

krish mohan
During search, whether Lucene uses FST in .tip file to match against the
terms? How the changes to the index will be updated in FST? Will it be
re-constructed or will it be updated in existing FST?

In case of wildcard and fuzzy queries, Lucene needs to test a large number
of terms. Will FST be useful to skip some portion of terms from comparing?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How FST constructed in lucene?

Adrien Grand
Le ven. 17 févr. 2017 à 11:17, krish mohan <[hidden email]> a
écrit :

> During search, whether Lucene uses FST in .tip file to match against the
> terms? How the changes to the index will be updated in FST? Will it be
> re-constructed or will it be updated in existing FST?
>

Lucene never updates existing files. What will happen in practice when you
add or update documents is that it will build new segments that will build
their own FST. FSTs are also re-built upon merge.


> In case of wildcard and fuzzy queries, Lucene needs to test a large number
> of terms. Will FST be useful to skip some portion of terms from comparing?
>

Indeed it will. When intersecting the terms dictionary with an automaton,
Lucene basically performs a leap-frog between the terms contained in the
terms dictionary and the terms that are accepted by the automaton. In some
cases, the terms dictionary will lead the iteration and the automaton will
be used to verify that terms match, but in other cases Lucene will ask the
automaton what the next accepted term is and will ask the terms dictionary
to advance on or beyond that term (seekCeil), which uses the FST under the
hood. You can look at AutomatonTermsEnum and FilteredTermsEnum if you would
like to learn more about how it works.
Loading...