Quantcast

Searching for a strict prefix

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Searching for a strict prefix

Shlomy Reinstein
Hi,

I have a Lucene index that contains source code tags (a tag can be any
named source code element - function, class, variable). Each document
contains a field with the tag name and some additional information.
I'd like to be able to perform strict prefix queries. E.g. if I have a
tag named "updateChildren", I'd like to be able to write a query like
"updateC*" and get the list of tags that have this prefix
(updateChildren being one of them).

Is there a way to do this? Please suggest how I should store the field
for this purpose - which analyzer to use, whether to keep it stored,
etc.

Thanks,
Shlomy

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching for a strict prefix

Ian Lea
StandardAnalyzer should work fine, mark the field as indexed, no need
to store it unless you want to retrieve it for display.

Query via QueryParser using "tagname: updateC*" or programatically via
PrefixQuery.


Although I'm not sure exactly what you mean by "strict prefix".  If
you mean that the prefix should match exactly on case, punctuation
etc. maybe use KeywordAnalyzer instead.



--
Ian.


On Sun, May 23, 2010 at 3:03 PM, Shlomy Reinstein <[hidden email]> wrote:

> Hi,
>
> I have a Lucene index that contains source code tags (a tag can be any
> named source code element - function, class, variable). Each document
> contains a field with the tag name and some additional information.
> I'd like to be able to perform strict prefix queries. E.g. if I have a
> tag named "updateChildren", I'd like to be able to write a query like
> "updateC*" and get the list of tags that have this prefix
> (updateChildren being one of them).
>
> Is there a way to do this? Please suggest how I should store the field
> for this purpose - which analyzer to use, whether to keep it stored,
> etc.
>
> Thanks,
> Shlomy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching for a strict prefix

Shlomy Reinstein
Hi,

Thanks. By "strict prefix", I meant a prefix of the name
(case-insensitive). What you suggest ("tagname: updateC*") was the
first thing I tried, but it happens to work only partially. In my
case, I have a lot of names beginning with "m_sz<something>", e.g.
"m_szComment", "m_szName". Trying a query like "tagname: m_szC*" will
not find the "m_szComment" value. I just wrote the wrong example here.

Shlomy

On Mon, May 24, 2010 at 11:32 AM, Ian Lea <[hidden email]> wrote:

> StandardAnalyzer should work fine, mark the field as indexed, no need
> to store it unless you want to retrieve it for display.
>
> Query via QueryParser using "tagname: updateC*" or programatically via
> PrefixQuery.
>
>
> Although I'm not sure exactly what you mean by "strict prefix".  If
> you mean that the prefix should match exactly on case, punctuation
> etc. maybe use KeywordAnalyzer instead.
>
>
>
> --
> Ian.
>
>
> On Sun, May 23, 2010 at 3:03 PM, Shlomy Reinstein <[hidden email]> wrote:
>> Hi,
>>
>> I have a Lucene index that contains source code tags (a tag can be any
>> named source code element - function, class, variable). Each document
>> contains a field with the tag name and some additional information.
>> I'd like to be able to perform strict prefix queries. E.g. if I have a
>> tag named "updateChildren", I'd like to be able to write a query like
>> "updateC*" and get the list of tags that have this prefix
>> (updateChildren being one of them).
>>
>> Is there a way to do this? Please suggest how I should store the field
>> for this purpose - which analyzer to use, whether to keep it stored,
>> etc.
>>
>> Thanks,
>> Shlomy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching for a strict prefix

Ian Lea
I bet it's that underscore in m_sz.  Different analyzers do different
things with different punctuation characters.  I can never remember
which does exactly what - it'll be in the javadocs or Lucene In Action
or somewhere on the web.

You can check what exactly has been indexed by using Luke - always a good idea.

In this case you probably want an analyzer that just downcases the tag
names.  Easy enough to build your own if there isn't an existing one
that meets your needs.


--
Ian.


On Mon, May 24, 2010 at 1:14 PM, Shlomy Reinstein <[hidden email]> wrote:

> Hi,
>
> Thanks. By "strict prefix", I meant a prefix of the name
> (case-insensitive). What you suggest ("tagname: updateC*") was the
> first thing I tried, but it happens to work only partially. In my
> case, I have a lot of names beginning with "m_sz<something>", e.g.
> "m_szComment", "m_szName". Trying a query like "tagname: m_szC*" will
> not find the "m_szComment" value. I just wrote the wrong example here.
>
> Shlomy
>
> On Mon, May 24, 2010 at 11:32 AM, Ian Lea <[hidden email]> wrote:
>> StandardAnalyzer should work fine, mark the field as indexed, no need
>> to store it unless you want to retrieve it for display.
>>
>> Query via QueryParser using "tagname: updateC*" or programatically via
>> PrefixQuery.
>>
>>
>> Although I'm not sure exactly what you mean by "strict prefix".  If
>> you mean that the prefix should match exactly on case, punctuation
>> etc. maybe use KeywordAnalyzer instead.
>>
>>
>>
>> --
>> Ian.
>>
>>
>> On Sun, May 23, 2010 at 3:03 PM, Shlomy Reinstein <[hidden email]> wrote:
>>> Hi,
>>>
>>> I have a Lucene index that contains source code tags (a tag can be any
>>> named source code element - function, class, variable). Each document
>>> contains a field with the tag name and some additional information.
>>> I'd like to be able to perform strict prefix queries. E.g. if I have a
>>> tag named "updateChildren", I'd like to be able to write a query like
>>> "updateC*" and get the list of tags that have this prefix
>>> (updateChildren being one of them).
>>>
>>> Is there a way to do this? Please suggest how I should store the field
>>> for this purpose - which analyzer to use, whether to keep it stored,
>>> etc.
>>>
>>> Thanks,
>>> Shlomy
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching for a strict prefix

Shlomy Reinstein
Hi,

Thanks for the help.
BTW, if anyone is interested: I tried the same with KeywordAnalyzer -
added a field with value "m_szName", and tried to find it using
"FieldName:m_szN*" but failed. Someone in the Lucene IRC channel
showed me why - QueryParser, by default, lowercases all expanded terms
(e.g. those with '*' in them), so it actually looked for "m_szn*".

Thanks again,
Shlomy

On Mon, May 24, 2010 at 4:37 PM, Ian Lea <[hidden email]> wrote:

> I bet it's that underscore in m_sz.  Different analyzers do different
> things with different punctuation characters.  I can never remember
> which does exactly what - it'll be in the javadocs or Lucene In Action
> or somewhere on the web.
>
> You can check what exactly has been indexed by using Luke - always a good idea.
>
> In this case you probably want an analyzer that just downcases the tag
> names.  Easy enough to build your own if there isn't an existing one
> that meets your needs.
>
>
> --
> Ian.
>
>
> On Mon, May 24, 2010 at 1:14 PM, Shlomy Reinstein <[hidden email]> wrote:
>> Hi,
>>
>> Thanks. By "strict prefix", I meant a prefix of the name
>> (case-insensitive). What you suggest ("tagname: updateC*") was the
>> first thing I tried, but it happens to work only partially. In my
>> case, I have a lot of names beginning with "m_sz<something>", e.g.
>> "m_szComment", "m_szName". Trying a query like "tagname: m_szC*" will
>> not find the "m_szComment" value. I just wrote the wrong example here.
>>
>> Shlomy
>>
>> On Mon, May 24, 2010 at 11:32 AM, Ian Lea <[hidden email]> wrote:
>>> StandardAnalyzer should work fine, mark the field as indexed, no need
>>> to store it unless you want to retrieve it for display.
>>>
>>> Query via QueryParser using "tagname: updateC*" or programatically via
>>> PrefixQuery.
>>>
>>>
>>> Although I'm not sure exactly what you mean by "strict prefix".  If
>>> you mean that the prefix should match exactly on case, punctuation
>>> etc. maybe use KeywordAnalyzer instead.
>>>
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Sun, May 23, 2010 at 3:03 PM, Shlomy Reinstein <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> I have a Lucene index that contains source code tags (a tag can be any
>>>> named source code element - function, class, variable). Each document
>>>> contains a field with the tag name and some additional information.
>>>> I'd like to be able to perform strict prefix queries. E.g. if I have a
>>>> tag named "updateChildren", I'd like to be able to write a query like
>>>> "updateC*" and get the list of tags that have this prefix
>>>> (updateChildren being one of them).
>>>>
>>>> Is there a way to do this? Please suggest how I should store the field
>>>> for this purpose - which analyzer to use, whether to keep it stored,
>>>> etc.
>>>>
>>>> Thanks,
>>>> Shlomy
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...