Faceted Search!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Faceted Search!

niraj tulachan
Hi all,
    I'm couple of days old with Solr so I'm very new to this.  However, I'm trying to implemented Faceted search somewhat close to CNET shopper.com. Instead of using some items (like "camera"), I want to search for documents.  I'm planning to use Nutch to crawl that website and use Solr to cluster my search results.  I tried integrating Nutch with Solr following FooFactory.com's blog ......but I could not follow few of the steps as I'm very new to both of them.  If anyone of you have implemented, can you please give me suggestion or code snippets so that I can implemented them to achieve the "faceted search".  Any help would be appericated.
  Thanks,
  Niraj  

       
---------------------------------
You snooze, you lose. Get messages ASAP with AutoCheck
 in the all-new Yahoo! Mail Beta.
Reply | Threaded
Open this post in threaded view
|

Re: Faceted Search!

Chris Hostetter-3

: search for documents.  I'm planning to use Nutch to crawl that website
: and use Solr to cluster my search results.  I tried integrating Nutch
: with Solr following FooFactory.com's blog ......but I could not follow
: few of the steps as I'm very new to both of them.  If anyone of you have
: implemented, can you please give me suggestion or code snippets so that
: I can implemented them to achieve the "faceted search".  Any help would
: be appericated.

I'm not very familiar with the Nutch/Solr hybrid stuff some people have
done, but faceting requires that you have well structured fields
containing discreet peices of information ... ie: if you want to facet
cameras on manufacturer, megapixels, weight, and battery life, you need
sepertate fields for manufacturer, megapixels, weiht, and mattery life ...
i'm not sure that nutch is going to be able to do that for you.

extracting structured data out of webpages like that without writing
customer parser code for each website layout is a pretty weight data
harvesting problem.

-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Faceted Search!

niraj tulachan
Thanks Chris for replying my question.  So I'm thinking about using a CMS and when somebody publishes a page in CMS, I would generated this well structure XML file and feed that xml to Solr to generate the index on those data. Then, I can simply do faceted search using the correct Lucene query format, rite?  Do you have any other ideas or comment on my CMS approach?
  Cheers,
  Niraj

Chris Hostetter <[hidden email]> wrote:
 
: search for documents. I'm planning to use Nutch to crawl that website
: and use Solr to cluster my search results. I tried integrating Nutch
: with Solr following FooFactory.com's blog ......but I could not follow
: few of the steps as I'm very new to both of them. If anyone of you have
: implemented, can you please give me suggestion or code snippets so that
: I can implemented them to achieve the "faceted search". Any help would
: be appericated.

I'm not very familiar with the Nutch/Solr hybrid stuff some people have
done, but faceting requires that you have well structured fields
containing discreet peices of information ... ie: if you want to facet
cameras on manufacturer, megapixels, weight, and battery life, you need
sepertate fields for manufacturer, megapixels, weiht, and mattery life ...
i'm not sure that nutch is going to be able to do that for you.

extracting structured data out of webpages like that without writing
customer parser code for each website layout is a pretty weight data
harvesting problem.

-Hoss



       
---------------------------------
Got a little couch potato?
Check out fun summer activities for kids.
Reply | Threaded
Open this post in threaded view
|

Re: Faceted Search!

Chris Hostetter-3

: Thanks Chris for replying my question.  So I'm thinking about using a
: CMS and when somebody publishes a page in CMS, I would generated this
: well structure XML file and feed that xml to Solr to generate the index
: on those data. Then, I can simply do faceted search using the correct
: Lucene query format, rite?  Do you have any other ideas or comment on my
: CMS approach?

that sounds fine ... as long as you have well structured data and you
aren't trying to extract it from unstructured HTML.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Faceted Search!

niraj tulachan
Hi Chris,
    thank you for the reply.  I was reading other posting regarding faceted search and seems like they are using the filtering capability of Lucene for that.  If that the case, can we have control over the "label" of categories?  For example: in shopper.com when we search for camera gives us the cluster by price, pixal, manufacture and so on.  and if we are feeding the xml file to Solr server for faceted search, how can we define the sub-categories.  let's say from the above example, the category "price" has different sub-categories like "less than 100" ,"100-200"?  I'm guessing, we explicit define this in XML feed file, but I could be very wrong.  In any case, can you please give me the short example achieve that implementation.  Well, thanks once again.
  Cheers,
  Niraj

Chris Hostetter <[hidden email]> wrote:
 
: Thanks Chris for replying my question. So I'm thinking about using a
: CMS and when somebody publishes a page in CMS, I would generated this
: well structure XML file and feed that xml to Solr to generate the index
: on those data. Then, I can simply do faceted search using the correct
: Lucene query format, rite? Do you have any other ideas or comment on my
: CMS approach?

that sounds fine ... as long as you have well structured data and you
aren't trying to extract it from unstructured HTML.



-Hoss



 
---------------------------------
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.
Reply | Threaded
Open this post in threaded view
|

Re: Faceted Search!

Chris Hostetter-3

: define the sub-categories.  let's say from the above example, the
: category "price" has different sub-categories like "less than 100"
: ,"100-200"?  I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong.  In any case, can you please give me the short
: example achieve that implementation.  Well, thanks once again.

there's nothing "out of the box" from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's "Simple Faceting" support is esigned to be just
that: simple.  but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Faceted Search!

maustin
Niraj: What environment are you using? SQL Server/.NET/Windows? or something
else?

-Mike

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Wednesday, June 20, 2007 4:24 PM
To: [hidden email]
Subject: Re: Faceted Search!



: define the sub-categories.  let's say from the above example, the
: category "price" has different sub-categories like "less than 100"
: ,"100-200"?  I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong.  In any case, can you please give me the short
: example achieve that implementation.  Well, thanks once again.

there's nothing "out of the box" from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's "Simple Faceting" support is esigned to be just
that: simple.  but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Faceted Search!

niraj tulachan
Hi Mike,
    Currently, I'm just running the demo example provided in the Solr web site on my local windows machines.  I was purely looking into generating XML feed file and feeding to the Solr server.  However, I was also looking into implementing having sub-categories within the categories if that make sense.  For example, in the shopper.com we have the categories of by price, manufactures and so on and with in them,they are sub categories (price is sub-cat into <$100, 100-200, 200-300 etc).  I don't have constraint in terms of technology.  If I have to implement db server I won't mind implementing it.  Anyway, plz shine a light on how would you handle this issue.  Any suggestion will be appericated.
  Thanks,
  Niraj
Mike Austin <[hidden email]> wrote:
  Niraj: What environment are you using? SQL Server/.NET/Windows? or something
else?

-Mike

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Wednesday, June 20, 2007 4:24 PM
To: [hidden email]
Subject: Re: Faceted Search!



: define the sub-categories. let's say from the above example, the
: category "price" has different sub-categories like "less than 100"
: ,"100-200"? I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong. In any case, can you please give me the short
: example achieve that implementation. Well, thanks once again.

there's nothing "out of the box" from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's "Simple Faceting" support is esigned to be just
that: simple. but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss



       
---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games.
Reply | Threaded
Open this post in threaded view
|

RE: Faceted Search!

Chris Hostetter-3

: generating XML feed file and feeding to the Solr server.  However, I was
: also looking into implementing having sub-categories within the
: categories if that make sense.  For example, in the shopper.com we have
: the categories of by price, manufactures and so on and with in them,they
: are sub categories (price is sub-cat into <$100, 100-200, 200-300 etc).
: I don't have constraint in terms of technology.  If I have to implement
: db server I won't mind implementing it.  Anyway, plz shine a light on
: how would you handle this issue.  Any suggestion will be appericated.

the shopper.com solution is very VERY specialized and specific to the
datamodel used to manage the category metadata .... if i had to do it
overagain i would do it a lot differnetly.

way way back there was a thread about "complex faceting" where i included
some ideas on a possible facet configuration xml syntax which could
then be parsed by a request handler, with different types of faceting
(simple query, ranges, based on terms, prefix) delegated to helper
classes.  there was also the idea of being able groups facets or make
facets depend on other facets (ie: don't show the author facet untill a
value has been picked from the author_initial facet)

nothing ever really came of it, but it's how i'd probably approach trying
to tackle something like the shopper.com functionality if CNET threw away
our product metadata data model and started from scratch.

http://www.nabble.com/metadata-about-result-sets--t1243321.html#a3334244



-Hoss