Nutch integration with Solr

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch integration with Solr

Timeka Cobb
Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
the section to connect the 2 but have an extreme hard time understanding.
Can someone help me with connecting the 2..I want to crawl entire websites
and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—

Blessings,
Timeka Cobb
Reply | Threaded
Open this post in threaded view
|

Re: Nutch integration with Solr

Sebastian Nagel-2
Hi Timeka,

well, the really short answer is: Nutch sends "documents" to Solr using
the Solr4j client library. A "document" is a single web page fetched, parsed
and split into indexable fields, e.g., "title", "keywords", "content".

For further information you may look into

  https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr

  https://wiki.apache.org/nutch/IndexWriters

  https://wiki.apache.org/nutch/Presentations
  https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch

For the tiny details, you may need to inspect the Nutch source code directly.

Best,
Sebastian

On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
> the section to connect the 2 but have an extreme hard time understanding.
> Can someone help me with connecting the 2..I want to crawl entire websites
> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
>
> Blessings,
> Timeka Cobb
>

Reply | Threaded
Open this post in threaded view
|

Re: Nutch integration with Solr

Timeka Cobb
Thank you for the answer but I still think I'm missing things..on Wiki
where is says to install Solr I don't understand the directions given that
lead up to creating a nutch core..how do I copy resources and manage
schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ

Timeka

On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
<[hidden email]> wrote:

> Hi Timeka,
>
> well, the really short answer is: Nutch sends "documents" to Solr using
> the Solr4j client library. A "document" is a single web page fetched,
> parsed
> and split into indexable fields, e.g., "title", "keywords", "content".
>
> For further information you may look into
>
>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>
>   https://wiki.apache.org/nutch/IndexWriters
>
>   https://wiki.apache.org/nutch/Presentations
>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>
> For the tiny details, you may need to inspect the Nutch source code
> directly.
>
> Best,
> Sebastian
>
> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> > Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
> at
> > the section to connect the 2 but have an extreme hard time understanding.
> > Can someone help me with connecting the 2..I want to crawl entire
> websites
> > and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >
> > Blessings,
> > Timeka Cobb
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch integration with Solr

Sebastian Nagel-2
Hi Timeka,

you mean the steps given in
  https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?

The "nutch" core is defined only by a directory
   ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
which must contain the correct schema in the conf/ subfolder.

Commands to setup Solr and copy the schema are given in the tutorial
as Unix/Linux commands.

Could you tell us what is confusing. Agreed, the description could be
more detailed.

Thanks,
Sebastian

On 10/01/2018 03:29 PM, Timeka Cobb wrote:

> Thank you for the answer but I still think I'm missing things..on Wiki
> where is says to install Solr I don't understand the directions given that
> lead up to creating a nutch core..how do I copy resources and manage
> schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
>
> Timeka
>
> On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> <[hidden email]> wrote:
>
>> Hi Timeka,
>>
>> well, the really short answer is: Nutch sends "documents" to Solr using
>> the Solr4j client library. A "document" is a single web page fetched,
>> parsed
>> and split into indexable fields, e.g., "title", "keywords", "content".
>>
>> For further information you may look into
>>
>>
>> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>>
>>   https://wiki.apache.org/nutch/IndexWriters
>>
>>   https://wiki.apache.org/nutch/Presentations
>>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>>
>> For the tiny details, you may need to inspect the Nutch source code
>> directly.
>>
>> Best,
>> Sebastian
>>
>> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
>>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
>> at
>>> the section to connect the 2 but have an extreme hard time understanding.
>>> Can someone help me with connecting the 2..I want to crawl entire
>> websites
>>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
>>>
>>> Blessings,
>>> Timeka Cobb
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Nutch integration with Solr

Timeka Cobb
I'm actually using Ubuntu to configure it all so that is not the issue

Example: the copy resources command- I'm already in the Solr home folder so
the command would be cp -r solr-7.4.0 /server/../../../ in terminal?

Where I see {APACHE_SOLR_HOME} or {APACHE_NUTCH_HOME} I'm suppose to say
solr-7.4.0 or nutch-1.15 in the command line in place of these?

I copy and paste what I see and I get a kickback..Im just  trying to figure
out what are the proper commands to place in terminal to get both connected
and the core to run properly.  Thanks again for all your help..it's greatly
appreciated 😊

Timeka

On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<[hidden email]> wrote:

> Hi Timeka,
>
> you mean the steps given in
>   https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
>    ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <[hidden email]> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >>   https://wiki.apache.org/nutch/IndexWriters
> >>
> >>   https://wiki.apache.org/nutch/Presentations
> >>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch integration with Solr

Timeka Cobb
In reply to this post by Sebastian Nagel-2
Also I totally agree Sir Sebastian they could be much more detailed so that
newbies like me can understand better

On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<[hidden email]> wrote:

> Hi Timeka,
>
> you mean the steps given in
>   https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
>    ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <[hidden email]> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >>   https://wiki.apache.org/nutch/IndexWriters
> >>
> >>   https://wiki.apache.org/nutch/Presentations
> >>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>