Escaping escape char

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Escaping escape char

WATHELET Thomas-2
Hi,
 
I have an index with a field 'content' (tokenized, stored, indexed)
using Lucene 1.9.1.
I tried to search this text in exact string: "european parliament
resolution on the Commission report on the regional meetings arranged by
the Commission in 1998-1999 on the common fisheries policy after 2002"
I use this part of code to parse the query:
public void addQuery(String field, String value, Occur occ) {
        QueryParser queryparser = new QueryParser(field, new
SimpleAnalyzer());
        Query query = null;
        try {
            if (value.charAt(0) == '"' && value.charAt((value.length() -
1)) == '"') {
                value = QueryParser.escape(value.substring(1,
(value.length() - 1)));
                value = '"' + value + '"';
            } else {
                value = QueryParser.escape(value);
            }
            // PhraseQuery phraseQuery = new PhraseQuery();
            // phraseQuery.add(new Term(field, value));
            query = queryparser.parse(value);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        combinedQueries.add(query, occ);
 
When I print the Query I retrived this string:

+doccontent:"european parliament resolution on the commission report on
the regional meetings arranged by the commission in on the common
fisheries policy after"

Why 1998-1999  have been removed by the parser?

Reply | Threaded
Open this post in threaded view
|

FuzzyQurey in SpanQuery

Mark Miller-3
Anyone know of a way to get a fuzzy query into a spanquery?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

mark harwood
Something like this?

Query expandedQuery=fuzzyQuery.rewrite(reader);
HashSet termsSet=new HashSet();
expandedQuery.extractTerms(termsSet);
ArrayList termsList=new ArrayList();
for (Iterator iter = termsSet.iterator(); iter.hasNext();)
 {
      Term term = (Term) iter.next();
      SpanTermQuery stq=new SpanTermQuery(term);
       termsList.add(stq);      
 }
 SpanOrQuery soq=new SpanOrQuery((SpanQuery[]) termsList.toArray(new SpanQuery[termsList.size()]));

I imagine this general approach should work for other multi-term queries eg wildcards too.


----- Original Message ----
From: Mark Miller <[hidden email]>
To: [hidden email]
Sent: Thursday, 31 August, 2006 11:55:13 AM
Subject: FuzzyQurey in SpanQuery

Anyone know of a way to get a fuzzy query into a spanquery?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Karl Wettin
In reply to this post by Mark Miller-3
On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
> Anyone know of a way to get a fuzzy query into a spanquery?

http://issues.apache.org/jira/browse/LUCENE-522


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Escaping escape char

Erik Hatcher
In reply to this post by WATHELET Thomas-2

On Aug 31, 2006, at 5:43 AM, WATHELET Thomas wrote:

> Hi,
>         QueryParser queryparser = new QueryParser(field, new
> SimpleAnalyzer());
> +doccontent:"european parliament resolution on the commission  
> report on
> the regional meetings arranged by the commission in on the common
> fisheries policy after"
>
> Why 1998-1999  have been removed by the parser

SimpleAnalyzer only passes through alphabetic characters, not  
numerics or special characters.

        Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Mark Miller-3
In reply to this post by Karl Wettin
karl wettin wrote:

> On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
>  
>> Anyone know of a way to get a fuzzy query into a spanquery?
>>    
>
> http://issues.apache.org/jira/browse/LUCENE-522
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  
Great. Very sweet karl.

Question:
You run into problems with it crapping out on tostring with a null
pointer exception?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Mark Miller-3
In reply to this post by Karl Wettin
karl wettin wrote:

> On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
>  
>> Anyone know of a way to get a fuzzy query into a spanquery?
>>    
>
> http://issues.apache.org/jira/browse/LUCENE-522
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  
I found this:
Also it throws nullpointer if you do a toString() prior to rewriting the
query. Perhaps thats the way it is? Didn't check it out. Just reporting
before I forget about it.

When is a query rewritten? I build my query and then before using it, I
would like to print it out to double check it. Not possible? Does the
rewrite happen inside search?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Karl Wettin
On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote:

> When is a query rewritten? I build my query and then before using it, I
> would like to print it out to double check it. Not possible? Does the
> rewrite happen inside search?

Right, you can't do a toString prior to rewriting it. The problem is of
course that the rewritten query contains lots of possible choises for
the fuzzy term, extracted from the IndexReader passed when rewriting it.

If you reallt really want to inspect the rewritten query, create a new
instance and pass on the IndexReader. This will be slow though. In fact,
fuzzy span is quite slow by it self.

By the way, what do you plan to use it for? I use it as a very crued
text mining classifier, never exposed to the end user.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Mark Miller-3
karl wettin wrote:

> On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote:
>
>  
>> When is a query rewritten? I build my query and then before using it, I
>> would like to print it out to double check it. Not possible? Does the
>> rewrite happen inside search?
>>    
>
> Right, you can't do a toString prior to rewriting it. The problem is of
> course that the rewritten query contains lots of possible choises for
> the fuzzy term, extracted from the IndexReader passed when rewriting it.
>
> If you reallt really want to inspect the rewritten query, create a new
> instance and pass on the IndexReader. This will be slow though. In fact,
> fuzzy span is quite slow by it self.
>
> By the way, what do you plan to use it for? I use it as a very crued
> text mining classifier, never exposed to the end user.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  
I want to use it for my query parser so you can do a fuzzy search inside
of a proximity search. Is it any slower than a standard fuzzy query?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Karl Wettin
On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote:
>
> I want to use it for my query parser so you can do a fuzzy search
> inside of a proximity search. Is it any slower than a standard fuzzy
> query?

I find it to be extremly slow. All terms in the index need to be
enumerated (or a subset if a prefix length is provided). But try it out.
You are more than welcome to report the speed here or in the jira issue.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Mark Miller-3
karl wettin wrote:

> On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote:
>  
>> I want to use it for my query parser so you can do a fuzzy search
>> inside of a proximity search. Is it any slower than a standard fuzzy
>> query?
>>    
>
> I find it to be extremly slow. All terms in the index need to be
> enumerated (or a subset if a prefix length is provided). But try it out.
> You are more than welcome to report the speed here or in the jira issue.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  
Bad news for me. Any hope of a speedier fuzzy span?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Karl Wettin
On Thu, 2006-08-31 at 17:33 -0400, Mark Miller wrote:
>
> Bad news for me. Any hope of a speedier fuzzy span?

Using a spell checker comes in mind.

A speedier index is another way to go. RAMDirectory is n times faster
than FSDirectory and issue 550-index is 5x faster than RAMDirectory if
you only look at fuzzyness.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FuzzyQurey in SpanQuery

Karl Wettin
In reply to this post by Mark Miller-3
On Thu, 2006-08-31 at 17:33 -0400, Mark Miller wrote:
> Bad news for me. Any hope of a speedier fuzzy span?

I just came to think of something. Bob Carpenter posted some optimized
fuzzy code on det dev-list some time ago. According to my messurements
it was something like 15-25% faster. Don't know if it was committed.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]