distance between words

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

distance between words

luti
Hi,

I have a suggestion to improve nutch search results.
The "big" search engines (like google) measure the distance between the
query words.
E.g.:
query string: lucene in action
When you search for it in google, google will boost up that documents
where the "lucene in action" is in the same sequence.

I think it is possible in nutch/lucene (e.g. if your search string is:
"lucene in action"), but nutch don't make it.

Any ideas how to make it?


Regards,
       Ferenc
Reply | Threaded
Open this post in threaded view
|

Re: distance between words

luti
YourSoft írta:

> Hi,
>
> I have a suggestion to improve nutch search results.
> The "big" search engines (like google) measure the distance between
> the query words.
> E.g.:
> query string: lucene in action
> When you search for it in google, google will boost up that documents
> where the "lucene in action" is in the same sequence.
>
> I think it is possible in nutch/lucene (e.g. if your search string is:
> "lucene in action"), but nutch don't make it.
>
> Any ideas how to make it?
>
>
> Regards,
>       Ferenc
>
>
I'm sorry something is missing from previous mail:
When search the keywords, there something also improve the boost:
- How many times found the full query ('Lucene in action") in document.
(The length of total document / full query count - if it is bigger than
10-20% it is BAD)
- How many times found the query words in document ("lucene" "in" "action")

Reply | Threaded
Open this post in threaded view
|

mozdex

luti
Dear List!

I don't know who support mozdex.com, but this server doesn't search
since Saturday.

Regards,
    Ferenc
Reply | Threaded
Open this post in threaded view
|

Re: distance between words

luti
In reply to this post by luti
Sorry my bad English.
Ok, I'm see that I wrote my suggestion very wrongly.

Please try the following:
search in msn and google for the following:
Freddie i want to ride my bicycle

I think this is unambiguous what I would like to see in results.
In msn are 21,958 hits and there is the 4th position the good results.
(4th from 21,958)
In google there are 308,000 hits, and there is the first hit is the full
text of music (1st from 308,000)

I think in this situation the google results is better than msn. In the
google is a larger dataset, and there is better result.
I think the nutch results is bad in most cases.

I found that in 'explain.jsp' the result scored by full phrase also
("Freddie i want to ride my bicycle").
But in this situation it is bad, because "Freddie" is not near to "i
want...".

Best Regards,
    Ferenc