Categorizing search results

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Categorizing search results

kazam
Hi,

I am using nutch 0.8.1 to do site wide searches. I want certain results to
be boosted more than others for which I have added custom index terms and
boosted them.
However, now I have the need to categorize results into category so that
interesting categories are not buried deep under.
Has someone tried to categorize search results. For example out of a 100
results, 20 appear in category1, 50 appear in category 2 and all others
appear in a third category?

Thanks, Kenan.
Reply | Threaded
Open this post in threaded view
|

Re: Categorizing search results

Otis Gospodnetic-2-2
Kenan,

Have you considered using Carrot2?  I think Nutch includes a plugin for it already.  Or, if your categories are predefined, you could index with Solr (if you were to use Nutch 1.0) and use Solr's faceting capabilities.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----

> From: Kenan Azam <[hidden email]>
> To: nutch-user <[hidden email]>
> Sent: Tuesday, August 4, 2009 4:49:53 PM
> Subject: Categorizing search results
>
> Hi,
>
> I am using nutch 0.8.1 to do site wide searches. I want certain results to
> be boosted more than others for which I have added custom index terms and
> boosted them.
> However, now I have the need to categorize results into category so that
> interesting categories are not buried deep under.
> Has someone tried to categorize search results. For example out of a 100
> results, 20 appear in category1, 50 appear in category 2 and all others
> appear in a third category?
>
> Thanks, Kenan.

Reply | Threaded
Open this post in threaded view
|

Re: Categorizing search results

Dennis Kubes-2
In reply to this post by kazam
Visvo.com originally was a categorized wide web search.  While I don't
think our approach was the best way to proceed in hindsight, here is
what we did.

1) We had a mapreduce job that wasrun to place urls in a given category.
  The actual function for determining a category is arbitrary.   We
started with Bayesian methods based on noun phrases matched to hand
built categories, but it could be any function you want as long as it
maps url -> 1+ categories.  Our function returned floats for categories,
  highest matching category wins.

2) The job was such that if the function would pick the best category
out of a level, then rerun on its children.  The function returned a
float value.  If that value was higher than its parent it would continue
checking children at the next level and so on.  The idea behind this was
to find the best category in a tree of categories.

3) If a url was in a category, it was considered to be in all of its
parent categories.  So let's say we a url is in the following category:

/one/two/three/four

It is also considered to be in

/one/two/three
/one/two
/one

In the index we added a custom field called category and we would add
the category it was assigned to and all of its parent categories.

The UI would allow running keyword searches but also had a listing of
categories which were links.  There was some special logic to try and
determine relevant starting point in the category tree from the query.
Not real successful so most started at the base of the category tree.
Clicking on a link would run a query like this:

keywords AND category=/one/two/three

Which should return you categorized results.  As I said maybe not the
best approach but is an approach to having a categorized result. Hope
this helps.

Dennis


Kenan Azam wrote:

> Hi,
>
> I am using nutch 0.8.1 to do site wide searches. I want certain results to
> be boosted more than others for which I have added custom index terms and
> boosted them.
> However, now I have the need to categorize results into category so that
> interesting categories are not buried deep under.
> Has someone tried to categorize search results. For example out of a 100
> results, 20 appear in category1, 50 appear in category 2 and all others
> appear in a third category?
>
> Thanks, Kenan.
>
Reply | Threaded
Open this post in threaded view
|

Re: Categorizing search results

kazam
Thanks for the detailed reply. Our urls are already designed in a way that
they represent the category they are in. I like the idea of adding a custom
index term of category.  thanks again, kenan.
On Tue, Aug 4, 2009 at 10:52 PM, Dennis Kubes <[hidden email]> wrote:

> Visvo.com originally was a categorized wide web search.  While I don't
> think our approach was the best way to proceed in hindsight, here is what we
> did.
>
> 1) We had a mapreduce job that wasrun to place urls in a given category.
>  The actual function for determining a category is arbitrary.   We started
> with Bayesian methods based on noun phrases matched to hand built
> categories, but it could be any function you want as long as it maps url ->
> 1+ categories.  Our function returned floats for categories,  highest
> matching category wins.
>
> 2) The job was such that if the function would pick the best category out
> of a level, then rerun on its children.  The function returned a float
> value.  If that value was higher than its parent it would continue checking
> children at the next level and so on.  The idea behind this was to find the
> best category in a tree of categories.
>
> 3) If a url was in a category, it was considered to be in all of its parent
> categories.  So let's say we a url is in the following category:
>
> /one/two/three/four
>
> It is also considered to be in
>
> /one/two/three
> /one/two
> /one
>
> In the index we added a custom field called category and we would add the
> category it was assigned to and all of its parent categories.
>
> The UI would allow running keyword searches but also had a listing of
> categories which were links.  There was some special logic to try and
> determine relevant starting point in the category tree from the query. Not
> real successful so most started at the base of the category tree. Clicking
> on a link would run a query like this:
>
> keywords AND category=/one/two/three
>
> Which should return you categorized results.  As I said maybe not the best
> approach but is an approach to having a categorized result. Hope this helps.
>
> Dennis
>
>
>
> Kenan Azam wrote:
>
>> Hi,
>>
>> I am using nutch 0.8.1 to do site wide searches. I want certain results to
>> be boosted more than others for which I have added custom index terms and
>> boosted them.
>> However, now I have the need to categorize results into category so that
>> interesting categories are not buried deep under.
>> Has someone tried to categorize search results. For example out of a 100
>> results, 20 appear in category1, 50 appear in category 2 and all others
>> appear in a third category?
>>
>> Thanks, Kenan.
>>
>>