How to find "function()" - ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to find "function()" - ?

Dmitry Goldenberg
Hi,
 
I'm trying to figure out a way to locate tokens which include special characters.  The actual text in the file being indexed is something like "function() { statement1; statement2; }"
 
The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()".  If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.
 
How can I enforce () to be included?
 
Thanks,
- Dmitry
Reply | Threaded
Open this post in threaded view
|

Re: How to find "function()" - ?

Michael D. Curtin
Dmitry Goldenberg wrote:

> Hi,
>  
> I'm trying to figure out a way to locate tokens which include special characters.  The actual text in the file being indexed is something like "function() { statement1; statement2; }"
>  
> The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()".  If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.
>  
> How can I enforce () to be included?

I think you're going to have to write your own Analyzer subclass that
keeps special characters in the terms.  Then, use that Analyzer during
indexing.  The included Analyzers drop parentheses and the like.

If you're using Lucene's QueryParser, then use your new Analyzer there,
too, and escape things like parentheses in the query text you submit to
parse().

I think there's a discussion of custom Analyzers in the Lucene book, but
I don't know where.  Maybe somebody else on this list knows???

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: How to find "function()" - ?

Dmitry Goldenberg
Michael,
 
Yes, you're describing pretty much what I was thinking of but --
 
a) if I index "function()" as "function()" rather than "function", does that mean that if I search for "function", then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in some cases just function -- can I accommodate for both?
 
b) I understand about QueryParser.escape at searching time; at indexing time though, do I still need to escape the indexed values, e.g. keyword values, and store them in the escaped fashion, e.g. function\() -- or is function() ok?
 
Thanks,
- Dmitry

________________________________

From: Michael D. Curtin [mailto:[hidden email]]
Sent: Fri 1/27/2006 2:14 PM
To: [hidden email]
Subject: Re: How to find "function()" - ?



Dmitry Goldenberg wrote:

> Hi,
>
> I'm trying to figure out a way to locate tokens which include special characters.  The actual text in the file being indexed is something like "function() { statement1; statement2; }"
>
> The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()".  If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.
>
> How can I enforce () to be included?

I think you're going to have to write your own Analyzer subclass that
keeps special characters in the terms.  Then, use that Analyzer during
indexing.  The included Analyzers drop parentheses and the like.

If you're using Lucene's QueryParser, then use your new Analyzer there,
too, and escape things like parentheses in the query text you submit to
parse().

I think there's a discussion of custom Analyzers in the Lucene book, but
I don't know where.  Maybe somebody else on this list knows???

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How to find "function()" - ?

Michael D. Curtin
Dmitry Goldenberg wrote:

> a) if I index "function()" as "function()" rather than "function", does that mean that if I search for "function", then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in some cases just function -- can I accommodate for both?

The term "function" is different from the term "function()", so a
literal search for one won't find the other.  Your Analyzer could emit
two tokens for the input "function()":  "function" and "function()", at
the same position (increment 0) if that's what you want.

> b) I understand about QueryParser.escape at searching time; at indexing time though, do I still need to escape the indexed values, e.g. keyword values, and store them in the escaped fashion, e.g. function\() -- or is function() ok?

Don't escape them at index time, only at search time.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]