Solr Newbie - need a point in the right direction

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Newbie - need a point in the right direction

Mark-2
Hi,

First time poster here - I'm not entirely sure where I need to look for this
information.

What I'm trying to do is extract some (presumably) structured information
from non-uniform data (eg, prices from a nutch crawl) that needs to show in
search queries, and I've come up against a wall.

I've been unable to figure out where is the best place to begin.

I had a look through the solr wiki and did a search via Lucid's search tool
and I'm guessing this is handled at index time through my schema? But I've
also seen dismax being thrown around as a possible solution and this has
confused me.

Basically, if you guys could point me in the right direction for resources
(even as much as saying, you need X, it's over there) that would be a huge
help.

Cheers

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Solr Newbie - need a point in the right direction

Gora Mohanty-3
On Tue, Dec 7, 2010 at 9:12 AM, Mark <[hidden email]> wrote:
[...]

> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
[...]

Sorry, the above is a little unclear, at least to me. The basic steps in running
Solr are:
* Installing, configuring, and getting Solr running
* Indexing data, as also updating, and deleting: The best way to do this
  depends on where your data are coming from. Since you mention Nutch,
  that already integrates with Solr, although by default in a manner that
  dumps the entire content from a crawl into a Solr field. You will probably
  need to write a custom Nutch parser plugin in order to extract a subset
  from the content. Please see http://wiki.apache.org/nutch/RunningNutchAndSolr
* Searching through Solr

A good way of getting started is by going through the Solr tutorial:
http://lucene.apache.org/solr/tutorial.html . The Solr Wiki is also fairly
extensive: http://wiki.apache.org/solr/FrontPage . Finally, searching
Google for "solr getting started" turns up many likely-looking links.

Regards,
Gora
Reply | Threaded
Open this post in threaded view
|

Re: Solr Newbie - need a point in the right direction

Erick Erickson
In reply to this post by Mark-2
Solr is downstream of what I think you want. There's nothing in Solr
that allows you to take an arbitrary page and extract specific info
from it. I suspect the Nutch folks have dealt with this kind of question,
looking over the user's list there might give some insight.

Basically, once you have the page, you extract the information to
put into your structured Solr document, "extracting the information"
is the hard part and there's nothing built into Solr that I know of
that helps with that...

Best
Erick

On Mon, Dec 6, 2010 at 10:42 PM, Mark <[hidden email]> wrote:

> Hi,
>
> First time poster here - I'm not entirely sure where I need to look for
> this
> information.
>
> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
>
> Cheers
>
> Mark
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Newbie - need a point in the right direction

webdev1977
In reply to this post by Gora Mohanty-3
I my experience, the hardest (but most flexible part) is exactly what was mentioned.. processing the data.  Nutch does have a really easy plugin interface that you can use, and the example plugin is a great place to start.  Once you have the raw parsed text, you can do what ever you want with it.  For example, I wrote a  plugin to add geospatial information to my NutchDocument.  You then map the fields you added in the NutchDocument to something you want to have Solr index.  In my case I created a geography field where I put lat, lon info.  Then you create that same geography field in the nutch to solr mapping file as well as your solr schema.xml file.  Then, when you run the crawl and tell it to use "solrindex" it will send the document to solr to be indexed.  Since you have your new field in the schema, it knows what to do with it at index time.  Now you can build a user interface around what you want to do with that field.  

Reply | Threaded
Open this post in threaded view
|

Re: Solr Newbie - need a point in the right direction

Mark-2
Thanks to everyone who responded, no wonder I was getting confused, I was
completely focusing on the wrong half of the equation.

I had a cursory look through some of the Nutch documentation available and
it is looking promising.

Thanks everyone.

Mark

On Tue, Dec 7, 2010 at 10:19 PM, webdev1977 <[hidden email]> wrote:

>
> I my experience, the hardest (but most flexible part) is exactly what was
> mentioned.. processing the data.  Nutch does have a really easy plugin
> interface that you can use, and the example plugin is a great place to
> start.  Once you have the raw parsed text, you can do what ever you want
> with it.  For example, I wrote a  plugin to add geospatial information to
> my
> NutchDocument.  You then map the fields you added in the NutchDocument to
> something you want to have Solr index.  In my case I created a geography
> field where I put lat, lon info.  Then you create that same geography field
> in the nutch to solr mapping file as well as your solr schema.xml file.
> Then, when you run the crawl and tell it to use "solrindex" it will send
> the
> document to solr to be indexed.  Since you have your new field in the
> schema, it knows what to do with it at index time.  Now you can build a
> user
> interface around what you want to do with that field.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>