autocomplete with multiple terms

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

autocomplete with multiple terms

Martin Braun-2
Hello All,

I am implementing a query auto-complete function à la google. Right now
I am using a TermEnum enumerator on a specific field and list the Terms
found.
That works good for Searches with only one Term, but when the user's
typing two or three words the function will autocomplete each Term
individually - but the problem is that the combination of the terms
could probably return no results.
An autocomplete Function should be really fast, so a search for all
possible combinations of the terms  wouldn't be a good solution.

So my strategy is in a dead end.

Does anybody know a better way?

I am not sure if we get enough queries for a search over an index base
on the user-queries.

the only thing I have found in the list before concerning this subject
is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not sure if
it does the things I want.

tia,
martin



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: autocomplete with multiple terms

Karl Wettin

22 feb 2007 kl. 10.09 skrev Martin Braun:

> the only thing I have found in the list before concerning this subject
> is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not  
> sure if
> it does the things I want.


> I am not sure if we get enough queries for a search over an index base
> on the user-queries.

If the content of your corpus is static enough, then time is the  
friend that will enable you gather enough user queries to build the  
suggestion data set.

Otherwise you have to produce simulated user queries by reducing your  
data set to the most common information. Perhaps using Markov chains,  
top n paths of terms with Dijkstra or so could be an easy way out.  
You can also start looking at the documents people choose to inspect,  
and use these as the base for phrase training.

I think you will get further considering this from a behavioral  
psychology angle rather than how to access the  corpus access  
problem. Also, navigating a reduced data set (such as the trie in  
LUCENE-625 compared to the corpus it suggests to) will save you a lot  
of system resources.

Hope this helps some.

--
karl





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]