Boosting Search Results

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Boosting Search Results

bourne71
Hi, new here.

I recently started using lucene and had encounter a problem.I crawl and index a number of documents.
When i perform a search, lets say "tall fat", by right the results that matches all the keyword should be on top and display first.

But in my search results, some of the document with only 1 matches of the keyword like 'tall' is display first. Why is that? What had i done wrong?

can anyone advise me on this? thanks
Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

iorixxx

> When i perform a search, lets say "tall fat", by right the
> results that matches all the keyword should be on top and display first.

Answer of your question lies at the end of this thread:

http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

Ian Lea
In reply to this post by bourne71
Hi


It's not quite that simple.  Other things being equal, results that
match all keywords are likely to come first but there are other
factors such as term frequency and the length of the document.

Searcher.explain() will give you the gory details.  Luke will let you
see what is in your index.  A google for relevancy or scoring will
give you more info.  DefaultSimilarity is the default scoring
implementation.


--
Ian.


On Fri, Jul 31, 2009 at 11:01 AM, bourne71<[hidden email]> wrote:

>
> Hi, new here.
>
> I recently started using lucene and had encounter a problem.I crawl and
> index a number of documents.
> When i perform a search, lets say "tall fat", by right the results that
> matches all the keyword should be on top and display first.
>
> But in my search results, some of the document with only 1 matches of the
> keyword like 'tall' is display first. Why is that? What had i done wrong?
>
> can anyone advise me on this? thanks
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

prashant ullegaddi
In reply to this post by bourne71
It might be because there are hardly any documents containing both the
words.
Try exact search: "\"tall fat\""

On Fri, Jul 31, 2009 at 3:31 PM, bourne71 <[hidden email]> wrote:

>
> Hi, new here.
>
> I recently started using lucene and had encounter a problem.I crawl and
> index a number of documents.
> When i perform a search, lets say "tall fat", by right the results that
> matches all the keyword should be on top and display first.
>
> But in my search results, some of the document with only 1 matches of the
> keyword like 'tall' is display first. Why is that? What had i done wrong?
>
> can anyone advise me on this? thanks
> --
> View this message in context:
> http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

bourne71
Thanks for all the reply. It help me to understand problem better, but is it possible to create a query that will give additional boost to the results if and only if both of the word is found inside the results. This will definitely make sure that the results will be in the higher up of the list.

Can this type of query be created?
Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

henok sahilu
hello there
i like to know about the Boosting Search results thing
thanks


--- On Sun, 8/2/09, bourne71 <[hidden email]> wrote:

From: bourne71 <[hidden email]>
Subject: Re: Boosting Search Results
To: [hidden email]
Date: Sunday, August 2, 2009, 8:14 PM


Thanks for all the reply. It help me to understand problem better, but is it
possible to create a query that will give additional boost to the results if
and only if both of the word is found inside the results. This will
definitely make sure that the results will be in the higher up of the list.

Can this type of query be created?
--
View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




     
Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

Ian Lea
In reply to this post by bourne71
You could write your own Similarity, extending DefaultSimilarity and
overriding whichever methods will help you achieve your aims.

Or how about running 2 searches, the first with both words required
(+word1 +word2) and then a second search where they aren't both
required (word1 word2).  Then merge/dedup the two lists of hits,
keeping the ones from the first search at the top.


--
Ian.

On Mon, Aug 3, 2009 at 4:14 AM, bourne71<[hidden email]> wrote:

>
> Thanks for all the reply. It help me to understand problem better, but is it
> possible to create a query that will give additional boost to the results if
> and only if both of the word is found inside the results. This will
> definitely make sure that the results will be in the higher up of the list.
>
> Can this type of query be created?
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

bourne71
Hey, thanks for the suggestion.
I think of performing 2 searches as well. Unfortunately I dont know how to perform a search on the first results return. Could u guide me a little? I tried to look around for the information but found none

Thanks
Ian Lea wrote
You could write your own Similarity, extending DefaultSimilarity and
overriding whichever methods will help you achieve your aims.

Or how about running 2 searches, the first with both words required
(+word1 +word2) and then a second search where they aren't both
required (word1 word2).  Then merge/dedup the two lists of hits,
keeping the ones from the first search at the top.


--
Ian.

On Mon, Aug 3, 2009 at 4:14 AM, bourne71<garylkc@live.com> wrote:
>
> Thanks for all the reply. It help me to understand problem better, but is it
> possible to create a query that will give additional boost to the results if
> and only if both of the word is found inside the results. This will
> definitely make sure that the results will be in the higher up of the list.
>
> Can this type of query be created?
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

Ian Lea
Sorry, I'm not clear what you don't know how to do.


To spell out the double search suggestion a bit more:

QueryParser qp = new QueryParser(...)

Query q1 = qp.parse("+word1 +word2");
TopDocs td1 = searcher.search(q1, ...)

Query q2 = qp.parse("word1 word2");
TopDocs td2 = searcher.search(q2);

ScoreDoc[] sd1 = td1.scoreDocs;
ScoreDoc[] sd2 = td2.scoreDocs;

// Grab all docids from first search
List<Integer> docidl = new ArrayList<Integer>();
for (int i1 = 0; i1 < sd1.length; i1++) {
  docidl.add(sd1[i1].doc);
}

// Add any docids from second search that are not already on the list
for (int i2 = 0; i2 < sd2.length; i2++) {
  int docid = sd2[i2].doc);
  if (!docidl.contains(docid)) {
    docidl.add(docid);
  }
}

(code just a suggestion, off the top of my head, may not work, may be
full of bugs, there will be other maybe better ways to do it).

If that doesn't help, perhaps you could rephrase the question.


--
Ian.


On Mon, Aug 3, 2009 at 10:51 AM, bourne71<[hidden email]> wrote:

>
> Hey, thanks for the suggestion.
> I think of performing 2 searches as well. Unfortunately I dont know how to
> perform a search on the first results return. Could u guide me a little? I
> tried to look around for the information but found none
>
> Thanks
>
> Ian Lea wrote:
>>
>> You could write your own Similarity, extending DefaultSimilarity and
>> overriding whichever methods will help you achieve your aims.
>>
>> Or how about running 2 searches, the first with both words required
>> (+word1 +word2) and then a second search where they aren't both
>> required (word1 word2).  Then merge/dedup the two lists of hits,
>> keeping the ones from the first search at the top.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71<[hidden email]> wrote:
>>>
>>> Thanks for all the reply. It help me to understand problem better, but is
>>> it
>>> possible to create a query that will give additional boost to the results
>>> if
>>> and only if both of the word is found inside the results. This will
>>> definitely make sure that the results will be in the higher up of the
>>> list.
>>>
>>> Can this type of query be created?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Boosting Search Results

bourne71
Sorry...I mean the double searching part. That is the part I dont understand how to do...since after retrieving the 1st results, I am not sure how to search it again.

Ian Lea wrote
Sorry, I'm not clear what you don't know how to do.


To spell out the double search suggestion a bit more:

QueryParser qp = new QueryParser(...)

Query q1 = qp.parse("+word1 +word2");
TopDocs td1 = searcher.search(q1, ...)

Query q2 = qp.parse("word1 word2");
TopDocs td2 = searcher.search(q2);

ScoreDoc[] sd1 = td1.scoreDocs;
ScoreDoc[] sd2 = td2.scoreDocs;

// Grab all docids from first search
List<Integer> docidl = new ArrayList<Integer>();
for (int i1 = 0; i1 < sd1.length; i1++) {
  docidl.add(sd1[i1].doc);
}

// Add any docids from second search that are not already on the list
for (int i2 = 0; i2 < sd2.length; i2++) {
  int docid = sd2[i2].doc);
  if (!docidl.contains(docid)) {
    docidl.add(docid);
  }
}

(code just a suggestion, off the top of my head, may not work, may be
full of bugs, there will be other maybe better ways to do it).

If that doesn't help, perhaps you could rephrase the question.


--
Ian.


On Mon, Aug 3, 2009 at 10:51 AM, bourne71<garylkc@live.com> wrote:
>
> Hey, thanks for the suggestion.
> I think of performing 2 searches as well. Unfortunately I dont know how to
> perform a search on the first results return. Could u guide me a little? I
> tried to look around for the information but found none
>
> Thanks
>
> Ian Lea wrote:
>>
>> You could write your own Similarity, extending DefaultSimilarity and
>> overriding whichever methods will help you achieve your aims.
>>
>> Or how about running 2 searches, the first with both words required
>> (+word1 +word2) and then a second search where they aren't both
>> required (word1 word2).  Then merge/dedup the two lists of hits,
>> keeping the ones from the first search at the top.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71<garylkc@live.com> wrote:
>>>
>>> Thanks for all the reply. It help me to understand problem better, but is
>>> it
>>> possible to create a query that will give additional boost to the results
>>> if
>>> and only if both of the word is found inside the results. This will
>>> definitely make sure that the results will be in the higher up of the
>>> list.
>>>
>>> Can this type of query be created?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org