Understanding Query Parser Behavior

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding Query Parser Behavior

Peru Redmi
Hello All ,

Could someone explain *QueryParser* behavior on these cases

1. While Indexing ,

Document doc = new Document();

doc.add(new Field("*Field*", "*http://www.google.com
<http://www.google.com/>*", Field.Store.YES, Field.Index.ANALYZED));

      index has *two* terms - *http* & *www.google.com
<http://www.google.com/>*

*2.* While searching ,

Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, new
StringReader(""));
QueryParser parser=new *MultiFieldQueryParser*(Version.LUCENE_30, new
String[]{"*Field*"},anal);
Query query = parser.parse("*http://www.google.com <http://www.google.com/>*
");

Now , query has *three *terms  -  (Field:http) *(Field://)* (Field:
www.google.com)


i) Why I have got 3 terms while parsing , and 2 terms on indexing (Using
same ClassicAnalyzer in both cases ) ?

ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) on
Parser ?

iii) what should be done to avoid query part *(Field://) *?

Thanks,
Peru.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

wmartinusa
No
Sent from my LG G4, an AT&T 4G LTE smartphone
------ Original message------From: Peru RedmiDate: Mon, Nov 21, 2016 10:44 AMTo: [hidden email];Cc: Subject:Understanding Query Parser Behavior
Hello All ,Could someone explain *QueryParser* behavior on these cases1. While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", "*http://www.google.com*", Field.Store.YES, Field.Index.ANALYZED));      index has *two* terms - *http* & *www.google.com**2.* While searching ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser parser=new *MultiFieldQueryParser*(Version.LUCENE_30, newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.google.com *");Now , query has *three *terms  -  (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to avoid query part *(Field://) *?Thanks,Peru.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Peru Redmi
Hello,
Can you help me out on your "No" .

On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <[hidden email]
> wrote:

> No
>
> Sent from my LG G4, an AT&T 4G LTE smartphone
>
> ------ Original message------
> *From: *Peru Redmi
> *Date: *Mon, Nov 21, 2016 10:44 AM
> *To: *[hidden email];
> *Cc: *
> *Subject:*Understanding Query Parser Behavior
>
> Hello All ,Could someone explain *QueryParser* behavior on these cases1. While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", "*http://www.google.com*", Field.Store.YES, Field.Index.ANALYZED));      index has *two* terms - *http* & *www.google.com**2.* While searching ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser parser=new *MultiFieldQueryParser*(Version.LUCENE_30, newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.google.com *");Now , query has *three *terms  -  (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to avoid query part *(Field://) *?Thanks,Peru.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Peru Redmi
Could someone elaborate this.

On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]> wrote:

> Hello,
> Can you help me out on your "No" .
>
> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
> [hidden email]> wrote:
>
>> No
>>
>> Sent from my LG G4, an AT&T 4G LTE smartphone
>>
>> ------ Original message------
>> *From: *Peru Redmi
>> *Date: *Mon, Nov 21, 2016 10:44 AM
>> *To: *[hidden email];
>> *Cc: *
>> *Subject:*Understanding Query Parser Behavior
>>
>> Hello All ,Could someone explain *QueryParser* behavior on these cases1. While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", "*http://www.google.com*", Field.Store.YES, Field.Index.ANALYZED));      index has *two* terms - *http* & *www.google.com**2.* While searching ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser parser=new *MultiFieldQueryParser*(Version.LUCENE_30, newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.google.com *");Now , query has *three *terms  -  (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to avoid query part *(Field://) *?Thanks,Peru.
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Michael McCandless-2
Hi,

You should double check which analyzer you are using during indexing.

The same analyzer on the same string should produce the same tokens.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi <[hidden email]> wrote:

> Could someone elaborate this.
>
> On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]> wrote:
>
>> Hello,
>> Can you help me out on your "No" .
>>
>> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
>> [hidden email]> wrote:
>>
>>> No
>>>
>>> Sent from my LG G4, an AT&T 4G LTE smartphone
>>>
>>> ------ Original message------
>>> *From: *Peru Redmi
>>> *Date: *Mon, Nov 21, 2016 10:44 AM
>>> *To: *[hidden email];
>>> *Cc: *
>>> *Subject:*Understanding Query Parser Behavior
>>>
>>> Hello All ,Could someone explain *QueryParser* behavior on these cases1. While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", "*http://www.google.com*", Field.Store.YES, Field.Index.ANALYZED));      index has *two* terms - *http* & *www.google.com**2.* While searching ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser parser=new *MultiFieldQueryParser*(Version.LUCENE_30, newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.google.com *");Now , query has *three *terms  -  (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to avoid query part *(Field://) *?Thanks,Peru.
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Peru Redmi
Hello Mike,

Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer)
and plain ClassicAnalyzer. On checking the same in luke, i get "//"
as RegexQuery.

Here is my code snippet:

        String value = "http\\://www.google.com";
>         Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new
> StringReader(""));
>         QueryParser parser = new QueryParser(Version.LUCENE_30, "name",
> anal);
>         Query query = parser.parse(value);
>         System.out.println(" output terms from query parser ::" + query);



>
>         ArrayList list = new ArrayList();
>         TokenStream stream = anal.tokenStream("name", new
> StringReader(value));
>         stream.reset();
>         while (stream.incrementToken())
>         {
>
> list.add(stream.getAttribute(CharTermAttribute.class).toString());
>         }
>         System.out.println(" output terms from analyzer " + list);



output:

output terms from query parser ::name:http name:// name:www.google.com
output terms from analyzer [http, www.google.com]






On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless <
[hidden email]> wrote:

> Hi,
>
> You should double check which analyzer you are using during indexing.
>
> The same analyzer on the same string should produce the same tokens.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi <[hidden email]> wrote:
> > Could someone elaborate this.
> >
> > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]>
> wrote:
> >
> >> Hello,
> >> Can you help me out on your "No" .
> >>
> >> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
> >> [hidden email]> wrote:
> >>
> >>> No
> >>>
> >>> Sent from my LG G4, an AT&T 4G LTE smartphone
> >>>
> >>> ------ Original message------
> >>> *From: *Peru Redmi
> >>> *Date: *Mon, Nov 21, 2016 10:44 AM
> >>> *To: *[hidden email];
> >>> *Cc: *
> >>> *Subject:*Understanding Query Parser Behavior
> >>>
> >>> Hello All ,Could someone explain *QueryParser* behavior on these
> cases1. While Indexing ,Document doc = new Document();doc.add(new
> Field("*Field*", "*http://www.google.com*", Field.Store.YES,
> Field.Index.ANALYZED));      index has *two* terms - *http* & *
> www.google.com**2.* While searching ,Analyzer anal = new
> *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser
> parser=new *MultiFieldQueryParser*(Version.LUCENE_30,
> newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.
> google.com *");Now , query has *three *terms  -  (Field:http)
> *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while
> parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases
> ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30)
> onParser ?iii) what should be done to avoid query part *(Field://)
> *?Thanks,Peru.
> >>>
> >>>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Peru Redmi
Any help on this would be greatly appreciated.

Thanks.

On Thu, Nov 24, 2016 at 8:14 PM, Peru Redmi <[hidden email]> wrote:

>
> Hello Mike,
>
> Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer)
> and plain ClassicAnalyzer. On checking the same in luke, i get "//"
> as RegexQuery.
>
> Here is my code snippet:
>
>         String value = "http\\://www.google.com";
>>         Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new
>> StringReader(""));
>>         QueryParser parser = new QueryParser(Version.LUCENE_30, "name",
>> anal);
>>         Query query = parser.parse(value);
>>         System.out.println(" output terms from query parser ::" + query);
>
>
>
>>
>>         ArrayList list = new ArrayList();
>>         TokenStream stream = anal.tokenStream("name", new
>> StringReader(value));
>>         stream.reset();
>>         while (stream.incrementToken())
>>         {
>>             list.add(stream.getAttribute(CharTermAttribute.class).
>> toString());
>>         }
>>         System.out.println(" output terms from analyzer " + list);
>
>
>
> output:
>
> output terms from query parser ::name:http name:// name:www.google.com
> output terms from analyzer [http, www.google.com]
>
>
>
>
>
>
> On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless <
> [hidden email]> wrote:
>
>> Hi,
>>
>> You should double check which analyzer you are using during indexing.
>>
>> The same analyzer on the same string should produce the same tokens.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi <[hidden email]>
>> wrote:
>> > Could someone elaborate this.
>> >
>> > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]>
>> wrote:
>> >
>> >> Hello,
>> >> Can you help me out on your "No" .
>> >>
>> >> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
>> >> [hidden email]> wrote:
>> >>
>> >>> No
>> >>>
>> >>> Sent from my LG G4, an AT&T 4G LTE smartphone
>> >>>
>> >>> ------ Original message------
>> >>> *From: *Peru Redmi
>> >>> *Date: *Mon, Nov 21, 2016 10:44 AM
>> >>> *To: *[hidden email];
>> >>> *Cc: *
>> >>> *Subject:*Understanding Query Parser Behavior
>> >>>
>> >>> Hello All ,Could someone explain *QueryParser* behavior on these
>> cases1. While Indexing ,Document doc = new Document();doc.add(new
>> Field("*Field*", "*http://www.google.com*", Field.Store.YES,
>> Field.Index.ANALYZED));      index has *two* terms - *http* & *
>> www.google.com**2.* While searching ,Analyzer anal = new
>> *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser
>> parser=new *MultiFieldQueryParser*(Version.LUCENE_30,
>> newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.goog
>> le.com *");Now , query has *three *terms  -  (Field:http) *(Field://)*
>> (Field:www.google.com)i) Why I have got 3 terms while parsing , and 2
>> terms on indexing (Usingsame ClassicAnalyzer in both cases ) ?ii) is this
>> expected behavior of ClassicAnalyzer(Version.LUCENE_30) onParser ?iii)
>> what should be done to avoid query part *(Field://) *?Thanks,Peru.
>> >>>
>> >>>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Peru Redmi
Hello ,

It would be great , if someone could help on this.
*Note : I am using Lucene 4.10.4 version*

On Mon, Nov 28, 2016 at 5:37 PM, Peru Redmi <[hidden email]> wrote:

> Any help on this would be greatly appreciated.
>
> Thanks.
>
> On Thu, Nov 24, 2016 at 8:14 PM, Peru Redmi <[hidden email]> wrote:
>
>>
>> Hello Mike,
>>
>> Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer)
>> and plain ClassicAnalyzer. On checking the same in luke, i get "//"
>> as RegexQuery.
>>
>> Here is my code snippet:
>>
>>         String value = "http\\://www.google.com";
>>>         Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new
>>> StringReader(""));
>>>         QueryParser parser = new QueryParser(Version.LUCENE_30, "name",
>>> anal);
>>>         Query query = parser.parse(value);
>>>         System.out.println(" output terms from query parser ::" + query);
>>
>>
>>
>>>
>>>         ArrayList list = new ArrayList();
>>>         TokenStream stream = anal.tokenStream("name", new
>>> StringReader(value));
>>>         stream.reset();
>>>         while (stream.incrementToken())
>>>         {
>>>             list.add(stream.getAttribute(CharTermAttribute.class).toStri
>>> ng());
>>>         }
>>>         System.out.println(" output terms from analyzer " + list);
>>
>>
>>
>> output:
>>
>> output terms from query parser ::name:http name:// name:www.google.com
>> output terms from analyzer [http, www.google.com]
>>
>>
>>
>>
>>
>>
>> On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless <
>> [hidden email]> wrote:
>>
>>> Hi,
>>>
>>> You should double check which analyzer you are using during indexing.
>>>
>>> The same analyzer on the same string should produce the same tokens.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi <[hidden email]>
>>> wrote:
>>> > Could someone elaborate this.
>>> >
>>> > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]>
>>> wrote:
>>> >
>>> >> Hello,
>>> >> Can you help me out on your "No" .
>>> >>
>>> >> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
>>> >> [hidden email]> wrote:
>>> >>
>>> >>> No
>>> >>>
>>> >>> Sent from my LG G4, an AT&T 4G LTE smartphone
>>> >>>
>>> >>> ------ Original message------
>>> >>> *From: *Peru Redmi
>>> >>> *Date: *Mon, Nov 21, 2016 10:44 AM
>>> >>> *To: *[hidden email];
>>> >>> *Cc: *
>>> >>> *Subject:*Understanding Query Parser Behavior
>>> >>>
>>> >>> Hello All ,Could someone explain *QueryParser* behavior on these
>>> cases1. While Indexing ,Document doc = new Document();doc.add(new
>>> Field("*Field*", "*http://www.google.com*", Field.Store.YES,
>>> Field.Index.ANALYZED));      index has *two* terms - *http* & *
>>> www.google.com**2.* While searching ,Analyzer anal = new
>>> *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser
>>> parser=new *MultiFieldQueryParser*(Version.LUCENE_30,
>>> newString[]{"*Field*"},anal);Query query = parser.parse("*
>>> http://www.google.com *");Now , query has *three *terms  -
>>> (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3
>>> terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in
>>> both cases ) ?ii) is this expected behavior of
>>> ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done
>>> to avoid query part *(Field://) *?Thanks,Peru.
>>> >>>
>>> >>>
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Query Parser Behavior

Michael McCandless-2
Can you try escaping the / character to the query parser?  E.g. pass
this string instead:

    String value = "http\\:\\/\\/www.google.com";

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 29, 2016 at 11:38 AM, Peru Redmi <[hidden email]> wrote:

> Hello ,
>
> It would be great , if someone could help on this.
> *Note : I am using Lucene 4.10.4 version*
>
> On Mon, Nov 28, 2016 at 5:37 PM, Peru Redmi <[hidden email]> wrote:
>
>> Any help on this would be greatly appreciated.
>>
>> Thanks.
>>
>> On Thu, Nov 24, 2016 at 8:14 PM, Peru Redmi <[hidden email]> wrote:
>>
>>>
>>> Hello Mike,
>>>
>>> Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer)
>>> and plain ClassicAnalyzer. On checking the same in luke, i get "//"
>>> as RegexQuery.
>>>
>>> Here is my code snippet:
>>>
>>>         String value = "http\\://www.google.com";
>>>>         Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new
>>>> StringReader(""));
>>>>         QueryParser parser = new QueryParser(Version.LUCENE_30, "name",
>>>> anal);
>>>>         Query query = parser.parse(value);
>>>>         System.out.println(" output terms from query parser ::" + query);
>>>
>>>
>>>
>>>>
>>>>         ArrayList list = new ArrayList();
>>>>         TokenStream stream = anal.tokenStream("name", new
>>>> StringReader(value));
>>>>         stream.reset();
>>>>         while (stream.incrementToken())
>>>>         {
>>>>             list.add(stream.getAttribute(CharTermAttribute.class).toStri
>>>> ng());
>>>>         }
>>>>         System.out.println(" output terms from analyzer " + list);
>>>
>>>
>>>
>>> output:
>>>
>>> output terms from query parser ::name:http name:// name:www.google.com
>>> output terms from analyzer [http, www.google.com]
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless <
>>> [hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> You should double check which analyzer you are using during indexing.
>>>>
>>>> The same analyzer on the same string should produce the same tokens.
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi <[hidden email]>
>>>> wrote:
>>>> > Could someone elaborate this.
>>>> >
>>>> > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi <[hidden email]>
>>>> wrote:
>>>> >
>>>> >> Hello,
>>>> >> Can you help me out on your "No" .
>>>> >>
>>>> >> On Mon, Nov 21, 2016 at 11:16 PM, [hidden email] <
>>>> >> [hidden email]> wrote:
>>>> >>
>>>> >>> No
>>>> >>>
>>>> >>> Sent from my LG G4, an AT&T 4G LTE smartphone
>>>> >>>
>>>> >>> ------ Original message------
>>>> >>> *From: *Peru Redmi
>>>> >>> *Date: *Mon, Nov 21, 2016 10:44 AM
>>>> >>> *To: *[hidden email];
>>>> >>> *Cc: *
>>>> >>> *Subject:*Understanding Query Parser Behavior
>>>> >>>
>>>> >>> Hello All ,Could someone explain *QueryParser* behavior on these
>>>> cases1. While Indexing ,Document doc = new Document();doc.add(new
>>>> Field("*Field*", "*http://www.google.com*", Field.Store.YES,
>>>> Field.Index.ANALYZED));      index has *two* terms - *http* & *
>>>> www.google.com**2.* While searching ,Analyzer anal = new
>>>> *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser
>>>> parser=new *MultiFieldQueryParser*(Version.LUCENE_30,
>>>> newString[]{"*Field*"},anal);Query query = parser.parse("*
>>>> http://www.google.com *");Now , query has *three *terms  -
>>>> (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3
>>>> terms while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in
>>>> both cases ) ?ii) is this expected behavior of
>>>> ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done
>>>> to avoid query part *(Field://) *?Thanks,Peru.
>>>> >>>
>>>> >>>
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]