Smart Indexing for Better performance and functionality ??

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Smart Indexing for Better performance and functionality ??

RaviWhy
Hi,
  I have the following use case. I could implement the solution but performance is affected. I need some smart ways of doing this.
Use Case :
Incoming data has two fields which have values like 'WAL MART STORES INC'  and 'wal-mart-stores-inc'.  
Users can search the data either in 'walmart'  'wal mart' or 'wal-mart'  also partially on any part of the name from the start of word like 'wal', 'walm' 'wal m'  etc .   I could get the solution  by using two indexes, one as text field for the first field (wal mart ) column and sub word  wal-mart-stores (with WordDelimiterFilterFactory filter).  

Is there a smart way of doing or any other techniques to boost the performance? I need to use them for a high traffic application where the response requirements are around 50 milli seconds.
I have some control on modifying the incoming data and data set is around 100K records.

Can someone suggest better ways of implementing. I can provide more information the tokens and filters I am using.

Thanks
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Smart Indexing for Better performance and functionality ??

Otis Gospodnetic-2
Yerraguntla,

Which approaches have you tried so far?
You want a query for "walmart" to match a document that, in its original input form contains "wal mart"?
It sounds like you may want to try the n-gram approach with one of the NGram analyzers/factories.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Yerraguntla <[hidden email]>
To: [hidden email]
Sent: Tuesday, March 18, 2008 10:43:07 AM
Subject: Smart Indexing for Better performance and functionality ??


Hi,
  I have the following use case. I could implement the solution but
performance is affected. I need some smart ways of doing this.
Use Case :
Incoming data has two fields which have values like 'WAL MART STORES INC'
and 'wal-mart-stores-inc'.  
Users can search the data either in 'walmart'  'wal mart' or 'wal-mart'
also partially on any part of the name from the start of word like 'wal',
'walm' 'wal m'  etc .   I could get the solution  by using two indexes, one
as text field for the first field (wal mart ) column and sub word
wal-mart-stores (with WordDelimiterFilterFactory filter).  

Is there a smart way of doing or any other techniques to boost the
performance? I need to use them for a high traffic application where the
response requirements are around 50 milli seconds.
I have some control on modifying the incoming data and data set is around
100K records.

Can someone suggest better ways of implementing. I can provide more
information the tokens and filters I am using.

Thanks
Ravi
--
View this message in context: http://www.nabble.com/Smart-Indexing-for-Better-performance-and-functionality----tp16121987p16121987.html
Sent from the Solr - User mailing list archive at Nabble.com.