Quantcast

Leading Wildcard Search

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Leading Wildcard Search

harish.agarwal
Hi,

I'm curious about the performance issues around leading wildcard search - is there any way to get around it?  Could someone explain to me the nature of the issue?

Thanks!
Harish
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Leading Wildcard Search

Mark Miller-3
smock wrote:
> Hi,
>
> I'm curious about the performance issues around leading wildcard search - is
> there any way to get around it?  Could someone explain to me the nature of
> the issue?
>
> Thanks!
> Harish
>  
A lucene/solr index is much like the index in the back of a book.
Imagine I ask you to look up luce* in the index of a book - you would
first jump to l, then lu, etc until you found the entries for terms that
started with luce. Now imagine I ask you to look up terms *ene. You
can't skip to any section now. The matching terms could start with any
letter, and any letter can come next and so on. You have to scan the
whole index right?

Work arounds? Imagine making a second index (or adding another field)
with all of the terms spelled backwards.

- Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Leading Wildcard Search

harish.agarwal
Hi Mark,
Thanks!  That clears things up quite a bit.  Are there plans to incorporate a solr 'wildcard index' to contain infix terms, or alternately contain a backwards index to get around this term?  I'll plan on using the workaround in the meantime.

-Harish

markrmiller wrote
smock wrote:
> Hi,
>
> I'm curious about the performance issues around leading wildcard search - is
> there any way to get around it?  Could someone explain to me the nature of
> the issue?
>
> Thanks!
> Harish
>  
A lucene/solr index is much like the index in the back of a book.
Imagine I ask you to look up luce* in the index of a book - you would
first jump to l, then lu, etc until you found the entries for terms that
started with luce. Now imagine I ask you to look up terms *ene. You
can't skip to any section now. The matching terms could start with any
letter, and any letter can come next and so on. You have to scan the
whole index right?

Work arounds? Imagine making a second index (or adding another field)
with all of the terms spelled backwards.

- Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Leading Wildcard Search

Koji Sekiguchi
Mark,

How about introducing ReverseStringFilter into Lucene to solve this kind
of problem? :)

https://issues.apache.org/jira/browse/LUCENE-1398

Thank you,

Koji


smock wrote:

> Hi Mark,
> Thanks!  That clears things up quite a bit.  Are there plans to incorporate
> a solr 'wildcard index' to contain infix terms, or alternately contain a
> backwards index to get around this term?  I'll plan on using the workaround
> in the meantime.
>
> -Harish
>
>
> markrmiller wrote:
>  
>> smock wrote:
>>    
>>> Hi,
>>>
>>> I'm curious about the performance issues around leading wildcard search -
>>> is
>>> there any way to get around it?  Could someone explain to me the nature
>>> of
>>> the issue?
>>>
>>> Thanks!
>>> Harish
>>>  
>>>      
>> A lucene/solr index is much like the index in the back of a book.
>> Imagine I ask you to look up luce* in the index of a book - you would
>> first jump to l, then lu, etc until you found the entries for terms that
>> started with luce. Now imagine I ask you to look up terms *ene. You
>> can't skip to any section now. The matching terms could start with any
>> letter, and any letter can come next and so on. You have to scan the
>> whole index right?
>>
>> Work arounds? Imagine making a second index (or adding another field)
>> with all of the terms spelled backwards.
>>
>> - Mark
>>
>>
>>    
>
>  

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Leading Wildcard Search

Mark Miller-3
Nice Koji, I hadn't seen that. I'll take some time to look closer at the
patch.

I'm going to take a look at your new Lucene Highlighter code when I get
some time too. Sounds like good stuff.

- Mark

Koji Sekiguchi wrote:

> Mark,
>
> How about introducing ReverseStringFilter into Lucene to solve this
> kind of problem? :)
>
> https://issues.apache.org/jira/browse/LUCENE-1398
>
> Thank you,
>
> Koji
>
>
> smock wrote:
>> Hi Mark,
>> Thanks!  That clears things up quite a bit.  Are there plans to
>> incorporate
>> a solr 'wildcard index' to contain infix terms, or alternately contain a
>> backwards index to get around this term?  I'll plan on using the
>> workaround
>> in the meantime.
>>
>> -Harish
>>
>>
>> markrmiller wrote:
>>  
>>> smock wrote:
>>>    
>>>> Hi,
>>>>
>>>> I'm curious about the performance issues around leading wildcard
>>>> search -
>>>> is
>>>> there any way to get around it?  Could someone explain to me the
>>>> nature
>>>> of
>>>> the issue?
>>>>
>>>> Thanks!
>>>> Harish
>>>>        
>>> A lucene/solr index is much like the index in the back of a book.
>>> Imagine I ask you to look up luce* in the index of a book - you
>>> would first jump to l, then lu, etc until you found the entries for
>>> terms that started with luce. Now imagine I ask you to look up terms
>>> *ene. You can't skip to any section now. The matching terms could
>>> start with any letter, and any letter can come next and so on. You
>>> have to scan the whole index right?
>>>
>>> Work arounds? Imagine making a second index (or adding another
>>> field) with all of the terms spelled backwards.
>>>
>>> - Mark
>>>
>>>
>>>    
>>
>>  
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Leading Wildcard Search

Koji Sekiguchi
Cool! Thanks,

Koji


Mark Miller wrote:

> Nice Koji, I hadn't seen that. I'll take some time to look closer at
> the patch.
>
> I'm going to take a look at your new Lucene Highlighter code when I
> get some time too. Sounds like good stuff.
>
> - Mark
>
> Koji Sekiguchi wrote:
>> Mark,
>>
>> How about introducing ReverseStringFilter into Lucene to solve this
>> kind of problem? :)
>>
>> https://issues.apache.org/jira/browse/LUCENE-1398
>>
>> Thank you,
>>
>> Koji

Loading...