query performance with leading *

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

query performance with leading *

G.Long
Hi,

Is there a way to improve query performance when using a leading * as a
wildcard on a path property?

I have hundreds of queries to run on a lucene index (~250mo). Executing
those queries without the leading * is about 5x faster than with the
leading *. My problem is that I sometimes need to use the leading *.

Most of the queries have the full path as parameter but some of them
have only a part of it.

The queries look like:
"+projet:CCOM +path:*/folder5/folder6/folder_ab/

I'm using lucene 3.1.0

Regards,
Gary


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: query performance with leading *

Austin, Carl
You could possibly tokenize the value both forwards and in reverse, for
example:

123456 and 654321

You can then convert a query for *56 to 65* and this will increase
performance.

-----Original Message-----
From: G.Long [mailto:[hidden email]]
Sent: 13 February 2012 16:39
To: [hidden email]
Subject: query performance with leading *

Hi,

Is there a way to improve query performance when using a leading * as a
wildcard on a path property?

I have hundreds of queries to run on a lucene index (~250mo). Executing
those queries without the leading * is about 5x faster than with the
leading *. My problem is that I sometimes need to use the leading *.

Most of the queries have the full path as parameter but some of them
have only a part of it.

The queries look like:
"+projet:CCOM +path:*/folder5/folder6/folder_ab/

I'm using lucene 3.1.0

Regards,
Gary


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Please consider the environment before printing this email.
 
This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
 
Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.
 
The contents of this email may relate to dealings with other companies under the control of BAE Systems plc details of which can be found at http://www.baesystems.com/Businesses/index.htm.
 
Detica Limited is a BAE Systems company trading as BAE Systems Detica.
Detica Limited is registered in England and Wales under No: 1337451.
Registered office: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: query performance with leading *

Robert Muir
In reply to this post by G.Long
I think you can solve this with the tokenizers in the
org.apache.lucene.analysis.path package (in lucene-analyzers.jar)

In your case, looks like ReversePathHierarchyTokenizer might be what
you want, though you will need to upgrade to at least 3.2 to get it.

On Mon, Feb 13, 2012 at 11:38 AM, G.Long <[hidden email]> wrote:

> Hi,
>
> Is there a way to improve query performance when using a leading * as a
> wildcard on a path property?
>
> I have hundreds of queries to run on a lucene index (~250mo). Executing
> those queries without the leading * is about 5x faster than with the leading
> *. My problem is that I sometimes need to use the leading *.
>
> Most of the queries have the full path as parameter but some of them have
> only a part of it.
>
> The queries look like:
> "+projet:CCOM +path:*/folder5/folder6/folder_ab/
>
> I'm using lucene 3.1.0
>
> Regards,
> Gary
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: query performance with leading *

G.Long
Thank you for the tips,

Is there an analyzer which uses this tokenizer? If not, do you know any
tutorial which explain how to implement a custom analyzer? I didn't find
any.

Regards.

Le 13/02/2012 17:46, Robert Muir a écrit :

> I think you can solve this with the tokenizers in the
> org.apache.lucene.analysis.path package (in lucene-analyzers.jar)
>
> In your case, looks like ReversePathHierarchyTokenizer might be what
> you want, though you will need to upgrade to at least 3.2 to get it.
>
> On Mon, Feb 13, 2012 at 11:38 AM, G.Long<[hidden email]>  wrote:
>> Hi,
>>
>> Is there a way to improve query performance when using a leading * as a
>> wildcard on a path property?
>>
>> I have hundreds of queries to run on a lucene index (~250mo). Executing
>> those queries without the leading * is about 5x faster than with the leading
>> *. My problem is that I sometimes need to use the leading *.
>>
>> Most of the queries have the full path as parameter but some of them have
>> only a part of it.
>>
>> The queries look like:
>> "+projet:CCOM +path:*/folder5/folder6/folder_ab/
>>
>> I'm using lucene 3.1.0
>>
>> Regards,
>> Gary
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]