Not able to retrieve hits for a phrase

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Not able to retrieve hits for a phrase

Vishal Bathija
Hi,
I am not able to retrieve the number of hits for a particular phrase .
The code below retrieves the hits only for certain phrases. The code
snippet that I use is

rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
Documents\\Thesis\\luceneTest\\index");
PhraseQuery query =new PhraseQuery();
searcher = new IndexSearcher(rd);
Term[] phrTerm=new Term[phraseTerms.length];
for(int u=0; u<phraseTerms.length;u++)
 {
  phrTerm[u]=new Term("contents",phraseTerms[u]);
 query.add(phrTerm[u]);
  }

System.out.println("Query"+query.toString() );
Hits hits = searcher.search(query);
System.out.println("Number of hits :"+hits.length());

Number of hits is 0 for some phrases even though the phrase is present
in some of the documents.

This retrieves the hits for certain phrases such as

"avoids deadlock" but it does not work for a phrase such as
"Prevents Data Loss"


I am not sure what the problem could be as none of these phrases have
any special characters.  Do I need to use any other type of query?


Regards
Vishal
--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Not able to retrieve hits for a phrase

Erik Hatcher
This could be related to the analyzer you used during indexing.  Be  
aware that matches are *exact* including case.

        Erik

On Apr 17, 2006, at 1:34 AM, Vishal Bathija wrote:

> Hi,
> I am not able to retrieve the number of hits for a particular phrase .
> The code below retrieves the hits only for certain phrases. The code
> snippet that I use is
>
> rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
> Documents\\Thesis\\luceneTest\\index");
> PhraseQuery query =new PhraseQuery();
> searcher = new IndexSearcher(rd);
> Term[] phrTerm=new Term[phraseTerms.length];
> for(int u=0; u<phraseTerms.length;u++)
>  {
>   phrTerm[u]=new Term("contents",phraseTerms[u]);
>  query.add(phrTerm[u]);
>   }
>
> System.out.println("Query"+query.toString() );
> Hits hits = searcher.search(query);
> System.out.println("Number of hits :"+hits.length());
>
> Number of hits is 0 for some phrases even though the phrase is present
> in some of the documents.
>
> This retrieves the hits for certain phrases such as
>
> "avoids deadlock" but it does not work for a phrase such as
> "Prevents Data Loss"
>
>
> I am not sure what the problem could be as none of these phrases have
> any special characters.  Do I need to use any other type of query?
>
>
> Regards
> Vishal
> --
> Vishal Bathija
> Graduate Student
> Department of Computer Science & Systems Analysis
> Miami University
> Oxford,Ohio
> Phone: (513)-461-9239
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Not able to retrieve hits for a phrase

Vishal Bathija
I currently use
writer = new IndexWriter("index", new StandardAnalyzer(),true);
                       
Should I use any other analyzer. Yes I am aware that the matches are
case sensitive.

Regards
Vishal

On 4/17/06, Erik Hatcher <[hidden email]> wrote:

> This could be related to the analyzer you used during indexing.  Be
> aware that matches are *exact* including case.
>
>        Erik
>
> On Apr 17, 2006, at 1:34 AM, Vishal Bathija wrote:
>
> > Hi,
> > I am not able to retrieve the number of hits for a particular phrase .
> > The code below retrieves the hits only for certain phrases. The code
> > snippet that I use is
> >
> > rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
> > Documents\\Thesis\\luceneTest\\index");
> > PhraseQuery query =new PhraseQuery();
> > searcher = new IndexSearcher(rd);
> > Term[] phrTerm=new Term[phraseTerms.length];
> > for(int u=0; u<phraseTerms.length;u++)
> >  {
> >   phrTerm[u]=new Term("contents",phraseTerms[u]);
> >  query.add(phrTerm[u]);
> >   }
> >
> > System.out.println("Query"+query.toString() );
> > Hits hits = searcher.search(query);
> > System.out.println("Number of hits :"+hits.length());
> >
> > Number of hits is 0 for some phrases even though the phrase is present
> > in some of the documents.
> >
> > This retrieves the hits for certain phrases such as
> >
> > "avoids deadlock" but it does not work for a phrase such as
> > "Prevents Data Loss"
> >
> >
> > I am not sure what the problem could be as none of these phrases have
> > any special characters.  Do I need to use any other type of query?
> >
> >
> > Regards
> > Vishal
> > --
> > Vishal Bathija
> > Graduate Student
> > Department of Computer Science & Systems Analysis
> > Miami University
> > Oxford,Ohio
> > Phone: (513)-461-9239
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Not able to retrieve hits for a phrase

Erik Hatcher
Are the terms you're adding to PhraseQuery lowercased?  If not, then  
that is most likely the issue.

        Erik


On Apr 17, 2006, at 9:39 AM, Vishal Bathija wrote:

> I currently use
> writer = new IndexWriter("index", new StandardAnalyzer(),true);
>
> Should I use any other analyzer. Yes I am aware that the matches are
> case sensitive.
>
> Regards
> Vishal
>
> On 4/17/06, Erik Hatcher <[hidden email]> wrote:
>> This could be related to the analyzer you used during indexing.  Be
>> aware that matches are *exact* including case.
>>
>>        Erik
>>
>> On Apr 17, 2006, at 1:34 AM, Vishal Bathija wrote:
>>
>>> Hi,
>>> I am not able to retrieve the number of hits for a particular  
>>> phrase .
>>> The code below retrieves the hits only for certain phrases. The code
>>> snippet that I use is
>>>
>>> rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
>>> Documents\\Thesis\\luceneTest\\index");
>>> PhraseQuery query =new PhraseQuery();
>>> searcher = new IndexSearcher(rd);
>>> Term[] phrTerm=new Term[phraseTerms.length];
>>> for(int u=0; u<phraseTerms.length;u++)
>>>  {
>>>   phrTerm[u]=new Term("contents",phraseTerms[u]);
>>>  query.add(phrTerm[u]);
>>>   }
>>>
>>> System.out.println("Query"+query.toString() );
>>> Hits hits = searcher.search(query);
>>> System.out.println("Number of hits :"+hits.length());
>>>
>>> Number of hits is 0 for some phrases even though the phrase is  
>>> present
>>> in some of the documents.
>>>
>>> This retrieves the hits for certain phrases such as
>>>
>>> "avoids deadlock" but it does not work for a phrase such as
>>> "Prevents Data Loss"
>>>
>>>
>>> I am not sure what the problem could be as none of these phrases  
>>> have
>>> any special characters.  Do I need to use any other type of query?
>>>
>>>
>>> Regards
>>> Vishal
>>> --
>>> Vishal Bathija
>>> Graduate Student
>>> Department of Computer Science & Systems Analysis
>>> Miami University
>>> Oxford,Ohio
>>> Phone: (513)-461-9239
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> Vishal Bathija
> Graduate Student
> Department of Computer Science & Systems Analysis
> Miami University
> Oxford,Ohio
> Phone: (513)-461-9239
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Not able to retrieve hits for a phrase

Vishal Bathija
Hi Erik,
Thanks, that seemed to have solved the problem. Can you please
elaborate on the kind of input PhraseQuery takes in. Am I supposed to
add only lowercased terms to PhraseQuery. Is it possible to search for
a phrase that is not case sensitive?

Regards
Vishal

On 4/17/06, Erik Hatcher <[hidden email]> wrote:

> Are the terms you're adding to PhraseQuery lowercased?  If not, then
> that is most likely the issue.
>
>        Erik
>
>
> On Apr 17, 2006, at 9:39 AM, Vishal Bathija wrote:
>
> > I currently use
> > writer = new IndexWriter("index", new StandardAnalyzer(),true);
> >
> > Should I use any other analyzer. Yes I am aware that the matches are
> > case sensitive.
> >
> > Regards
> > Vishal
> >
> > On 4/17/06, Erik Hatcher <[hidden email]> wrote:
> >> This could be related to the analyzer you used during indexing.  Be
> >> aware that matches are *exact* including case.
> >>
> >>        Erik
> >>
> >> On Apr 17, 2006, at 1:34 AM, Vishal Bathija wrote:
> >>
> >>> Hi,
> >>> I am not able to retrieve the number of hits for a particular
> >>> phrase .
> >>> The code below retrieves the hits only for certain phrases. The code
> >>> snippet that I use is
> >>>
> >>> rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
> >>> Documents\\Thesis\\luceneTest\\index");
> >>> PhraseQuery query =new PhraseQuery();
> >>> searcher = new IndexSearcher(rd);
> >>> Term[] phrTerm=new Term[phraseTerms.length];
> >>> for(int u=0; u<phraseTerms.length;u++)
> >>>  {
> >>>   phrTerm[u]=new Term("contents",phraseTerms[u]);
> >>>  query.add(phrTerm[u]);
> >>>   }
> >>>
> >>> System.out.println("Query"+query.toString() );
> >>> Hits hits = searcher.search(query);
> >>> System.out.println("Number of hits :"+hits.length());
> >>>
> >>> Number of hits is 0 for some phrases even though the phrase is
> >>> present
> >>> in some of the documents.
> >>>
> >>> This retrieves the hits for certain phrases such as
> >>>
> >>> "avoids deadlock" but it does not work for a phrase such as
> >>> "Prevents Data Loss"
> >>>
> >>>
> >>> I am not sure what the problem could be as none of these phrases
> >>> have
> >>> any special characters.  Do I need to use any other type of query?
> >>>
> >>>
> >>> Regards
> >>> Vishal
> >>> --
> >>> Vishal Bathija
> >>> Graduate Student
> >>> Department of Computer Science & Systems Analysis
> >>> Miami University
> >>> Oxford,Ohio
> >>> Phone: (513)-461-9239
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> >
> > --
> > Vishal Bathija
> > Graduate Student
> > Department of Computer Science & Systems Analysis
> > Miami University
> > Oxford,Ohio
> > Phone: (513)-461-9239
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Not able to retrieve hits for a phrase

Erik Hatcher
PhraseQuery needs terms that match what got indexed, simple as that.  
QueryParser does this for you by using the specified analyzer on the  
"phrase text" within double quotes and creating a PhraseQuery out of  
the tokens.  When you're creating a PhraseQuery directly with the  
API, you need to be aware of how things are indexed in order to  
ensure that any normalization, such as lowercasing, that occurs  
during indexing also occurs on the text you're searching with.

Most frequently, to search without case sensitivity the text is  
lowercased during indexing, and also during searching.  
StandardAnalyzer lowercases, as do almost all analyzers you'll find  
in the core (except WhiteSpaceAnalyzer).

        Erik


On Apr 17, 2006, at 11:33 AM, Vishal Bathija wrote:

> Hi Erik,
> Thanks, that seemed to have solved the problem. Can you please
> elaborate on the kind of input PhraseQuery takes in. Am I supposed to
> add only lowercased terms to PhraseQuery. Is it possible to search for
> a phrase that is not case sensitive?
>
> Regards
> Vishal
>
> On 4/17/06, Erik Hatcher <[hidden email]> wrote:
>> Are the terms you're adding to PhraseQuery lowercased?  If not, then
>> that is most likely the issue.
>>
>>        Erik
>>
>>
>> On Apr 17, 2006, at 9:39 AM, Vishal Bathija wrote:
>>
>>> I currently use
>>> writer = new IndexWriter("index", new StandardAnalyzer(),true);
>>>
>>> Should I use any other analyzer. Yes I am aware that the matches are
>>> case sensitive.
>>>
>>> Regards
>>> Vishal
>>>
>>> On 4/17/06, Erik Hatcher <[hidden email]> wrote:
>>>> This could be related to the analyzer you used during indexing.  Be
>>>> aware that matches are *exact* including case.
>>>>
>>>>        Erik
>>>>
>>>> On Apr 17, 2006, at 1:34 AM, Vishal Bathija wrote:
>>>>
>>>>> Hi,
>>>>> I am not able to retrieve the number of hits for a particular
>>>>> phrase .
>>>>> The code below retrieves the hits only for certain phrases. The  
>>>>> code
>>>>> snippet that I use is
>>>>>
>>>>> rd= IndexReader.open("C:\\Documents and Settings\\Owner\\My
>>>>> Documents\\Thesis\\luceneTest\\index");
>>>>> PhraseQuery query =new PhraseQuery();
>>>>> searcher = new IndexSearcher(rd);
>>>>> Term[] phrTerm=new Term[phraseTerms.length];
>>>>> for(int u=0; u<phraseTerms.length;u++)
>>>>>  {
>>>>>   phrTerm[u]=new Term("contents",phraseTerms[u]);
>>>>>  query.add(phrTerm[u]);
>>>>>   }
>>>>>
>>>>> System.out.println("Query"+query.toString() );
>>>>> Hits hits = searcher.search(query);
>>>>> System.out.println("Number of hits :"+hits.length());
>>>>>
>>>>> Number of hits is 0 for some phrases even though the phrase is
>>>>> present
>>>>> in some of the documents.
>>>>>
>>>>> This retrieves the hits for certain phrases such as
>>>>>
>>>>> "avoids deadlock" but it does not work for a phrase such as
>>>>> "Prevents Data Loss"
>>>>>
>>>>>
>>>>> I am not sure what the problem could be as none of these phrases
>>>>> have
>>>>> any special characters.  Do I need to use any other type of query?
>>>>>
>>>>>
>>>>> Regards
>>>>> Vishal
>>>>> --
>>>>> Vishal Bathija
>>>>> Graduate Student
>>>>> Department of Computer Science & Systems Analysis
>>>>> Miami University
>>>>> Oxford,Ohio
>>>>> Phone: (513)-461-9239
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>> --
>>> Vishal Bathija
>>> Graduate Student
>>> Department of Computer Science & Systems Analysis
>>> Miami University
>>> Oxford,Ohio
>>> Phone: (513)-461-9239
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> Vishal Bathija
> Graduate Student
> Department of Computer Science & Systems Analysis
> Miami University
> Oxford,Ohio
> Phone: (513)-461-9239
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]