Solr Integration/Stemming?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Integration/Stemming?

Nick Tkach
First of all, a question on stemming.  We've tried applying the patches from
the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
work fine for the most part.  We are seeing one kind of strange result
though.  If we index a series of pages (web crawl of 2 of our sites) and
search for "stamp" in them, we get results for pages containing "stamped"
and "stamps" as you'd expect.  However if you search for "stamped" or
"stamps" directly, then you get no results.  Does that sound like we have a
configuration issue using the stemming patches, or do we need to extend the
patches?

Second, would we be better off just working on getting Solr & Nutch working
together and taking advantage of Solr's built-in stemming?

Third, has anyone had any luck with getting Solr working with Nutch?  We
tried applying the patches from NUTCH-442
<http://issues.apache.org/jira/browse/NUTCH-442>
but get failures from Hadoop when we try to run a job.
Reply | Threaded
Open this post in threaded view
|

RE: Solr Integration/Stemming?

Howie Wang

It sounds like the query parser is not stemming for you. Make sure
that you activate the new stemming query filter is activated in the
Nutch directory under your app server. Check the nutch-*.xml files
under WEB-INF/classes to make sure that your new query filter is
included.

Howie


> Date: Mon, 11 Feb 2008 12:19:59 -0600
> From: [hidden email]
> To: [hidden email]
> Subject: Solr Integration/Stemming?
>
> First of all, a question on stemming. We've tried applying the patches from
> the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
> work fine for the most part. We are seeing one kind of strange result
> though. If we index a series of pages (web crawl of 2 of our sites) and
> search for "stamp" in them, we get results for pages containing "stamped"
> and "stamps" as you'd expect. However if you search for "stamped" or
> "stamps" directly, then you get no results. Does that sound like we have a
> configuration issue using the stemming patches, or do we need to extend the
> patches?
>
> Second, would we be better off just working on getting Solr & Nutch working
> together and taking advantage of Solr's built-in stemming?
>
> Third, has anyone had any luck with getting Solr working with Nutch? We
> tried applying the patches from NUTCH-442
>
> but get failures from Hadoop when we try to run a job.

_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008
Reply | Threaded
Open this post in threaded view
|

Re: Solr Integration/Stemming?

Tkach
Ah, thank you very much!  Yes, that seems to have done the trick.  I'd
made the change when I was patching my copy of nutch-trunk, but hadn't
realized the changes to nutch-default.xml there didn't get transferred
when I did an 'ant tar' to build my "distro".

Specifically, I'd forgotten to make the change (as they have on the
wiki) to my nutch-default.xml, in the value for plugin.includes
replacing "query-(basic|site|url)" with "query-(stemmer|site|url)".

Howie Wang wrote:

> It sounds like the query parser is not stemming for you. Make sure
> that you activate the new stemming query filter is activated in the
> Nutch directory under your app server. Check the nutch-*.xml files
> under WEB-INF/classes to make sure that your new query filter is
> included.
>
> Howie
>
>
>> Date: Mon, 11 Feb 2008 12:19:59 -0600
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Solr Integration/Stemming?
>>
>> First of all, a question on stemming. We've tried applying the patches from
>> the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
>> work fine for the most part. We are seeing one kind of strange result
>> though. If we index a series of pages (web crawl of 2 of our sites) and
>> search for "stamp" in them, we get results for pages containing "stamped"
>> and "stamps" as you'd expect. However if you search for "stamped" or
>> "stamps" directly, then you get no results. Does that sound like we have a
>> configuration issue using the stemming patches, or do we need to extend the
>> patches?
>>
>> Second, would we be better off just working on getting Solr & Nutch working
>> together and taking advantage of Solr's built-in stemming?
>>
>> Third, has anyone had any luck with getting Solr working with Nutch? We
>> tried applying the patches from NUTCH-442
>>
>> but get failures from Hadoop when we try to run a job.
>
> _________________________________________________________________
> Connect and share in new ways with Windows Live.
> http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008

--
This email message and any attachments are for the sole use of the intended
recipient(s) and may contain information that is proprietary to Ahold and/or
its subsidiaries ("Ahold") or otherwise confidential or legally privileged.
If you have received this message in error, please notify the sender by
reply, and delete all copies of this message and any attachments.  If you
are the intended recipient you may use the information contained in this
message and any files attached to this message only as authorized by Ahold.
Files attached to this message may only be transmitted using secure systems
and appropriate means of encryption, and must be secured using the same
level of password and security protection with which the file was provided
to you.  Any unauthorized use, dissemination or disclosure of this message
or its attachments is strictly prohibited.
Reply | Threaded
Open this post in threaded view
|

Re: Solr Integration/Stemming?

Sathyam Y
All,
   
  I am trying to integrate PorterStemming with Nutch and was able to successfully follow the changes suggested at http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
   
  The search results are working well with stemmed words, but I am having difficulty getting correct summaries. I am using BasicSummarizer and it looks like the summarizer is trying to match non-stemmed query words with stemmed tokens from content. Any ideas how to resolve this issue. Has anyone had any experience working with summaries along with stemming.
   
  Thanks !
  Sathyam

Nick Tkach <[hidden email]> wrote:
  Ah, thank you very much! Yes, that seems to have done the trick. I'd
made the change when I was patching my copy of nutch-trunk, but hadn't
realized the changes to nutch-default.xml there didn't get transferred
when I did an 'ant tar' to build my "distro".

Specifically, I'd forgotten to make the change (as they have on the
wiki) to my nutch-default.xml, in the value for plugin.includes
replacing "query-(basic|site|url)" with "query-(stemmer|site|url)".

Howie Wang wrote:

> It sounds like the query parser is not stemming for you. Make sure
> that you activate the new stemming query filter is activated in the
> Nutch directory under your app server. Check the nutch-*.xml files
> under WEB-INF/classes to make sure that your new query filter is
> included.
>
> Howie
>
>
>> Date: Mon, 11 Feb 2008 12:19:59 -0600
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Solr Integration/Stemming?
>>
>> First of all, a question on stemming. We've tried applying the patches from
>> the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
>> work fine for the most part. We are seeing one kind of strange result
>> though. If we index a series of pages (web crawl of 2 of our sites) and
>> search for "stamp" in them, we get results for pages containing "stamped"
>> and "stamps" as you'd expect. However if you search for "stamped" or
>> "stamps" directly, then you get no results. Does that sound like we have a
>> configuration issue using the stemming patches, or do we need to extend the
>> patches?
>>
>> Second, would we be better off just working on getting Solr & Nutch working
>> together and taking advantage of Solr's built-in stemming?
>>
>> Third, has anyone had any luck with getting Solr working with Nutch? We
>> tried applying the patches from NUTCH-442
>>
>> but get failures from Hadoop when we try to run a job.
>
> _________________________________________________________________
> Connect and share in new ways with Windows Live.
> http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008

--
This email message and any attachments are for the sole use of the intended
recipient(s) and may contain information that is proprietary to Ahold and/or
its subsidiaries ("Ahold") or otherwise confidential or legally privileged.
If you have received this message in error, please notify the sender by
reply, and delete all copies of this message and any attachments. If you
are the intended recipient you may use the information contained in this
message and any files attached to this message only as authorized by Ahold.
Files attached to this message may only be transmitted using secure systems
and appropriate means of encryption, and must be secured using the same
level of password and security protection with which the file was provided
to you. Any unauthorized use, dissemination or disclosure of this message
or its attachments is strictly prohibited.


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
Reply | Threaded
Open this post in threaded view
|

stemming / summary problem

Sathyam Y
   
  I am trying to integrate PorterStemming with Nutch and was able to
 successfully follow the changes suggested at
 http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
   
  The search results are working well with stemmed words, but I am
 having difficulty getting correct summaries. I am using BasicSummarizer and
 it looks like the summarizer is trying to match non-stemmed query
 words with stemmed tokens from content. Any ideas how to resolve this
 issue. Has anyone had any experience working with summaries along with
 stemming.
   
  Thanks !
  Sathyam


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.