Quantcast

How to avoid repeatedly upload job jars

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to avoid repeatedly upload job jars

391772322
archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to avoid repeatedly upload job jars

Sebastian Nagel
Hi,

maybe the Hadoop Distributed Cache is what you are looking for?

Best,
Sebastian

On 03/02/2017 01:35 AM, 391772322 wrote:
> archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to avoid repeatedly upload job jars

katta surendra babu
Hi Sebastian,


 I am looking  to work with  Json related website to crawl the data of that
website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .



Here the problem is :

 for the 1st round I get the Json data into Hbase, but for second round  I
am not getting the meta data and the html links in nutch


So, please help me out if you  can ... to crawl the Json website completely.



On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[hidden email]>
wrote:

> Hi,
>
> maybe the Hadoop Distributed Cache is what you are looking for?
>
> Best,
> Sebastian
>
> On 03/02/2017 01:35 AM, 391772322 wrote:
> > archived nutch job jar has a size of about 400M, every step will upload
> this archive and distribute to every work node. Is there away to upload
> only nutch jar, but leave depended libs on every work node?
> >
>
>


--
Thanks & Regards
Surendra Babu Katta
8886747555
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to avoid repeatedly upload job jars

Sebastian Nagel
Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.

Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:

> Hi Sebastian,
>
>
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>
>
>
> Here the problem is :
>
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
>
>
> So, please help me out if you  can ... to crawl the Json website completely.
>
>
>
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[hidden email]>
> wrote:
>
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

回复: How to avoid repeatedly upload job jars

391772322
Sebastian:


I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?


bellow is all  I known of a mail list be:


send a mail to "[hidden email]".


------------------ 原始邮件 ------------------
发件人: "Sebastian Nagel";<[hidden email]>;
发送时间: 2017年3月3日(星期五) 凌晨1:33
收件人: "user"<[hidden email]>;

主题: Re: How to avoid repeatedly upload job jars



Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.

Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:

> Hi Sebastian,
>
>
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>
>
>
> Here the problem is :
>
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
>
>
> So, please help me out if you  can ... to crawl the Json website completely.
>
>
>
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[hidden email]>
> wrote:
>
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 回复: How to avoid repeatedly upload job jars

Sebastian Nagel
Hi,

you have to subscribe to the list by sending a mail to
   [hidden email]
for further information, see
   http://nutch.apache.org/mailing_lists.html

Best,
Sebastian

On 03/03/2017 09:03 AM, 391772322 wrote:

> Sebastian:
>
>
> I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?
>
>
> bellow is all  I known of a mail list be:
>
>
> send a mail to "[hidden email]".
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Sebastian Nagel";<[hidden email]>;
> 发送时间: 2017年3月3日(星期五) 凌晨1:33
> 收件人: "user"<[hidden email]>;
>
> 主题: Re: How to avoid repeatedly upload job jars
>
>
>
> Hi,
>
> please, start a new thread for a new topic or question.
> That will others help to find the right answer for their problem
> when searching in the mailing list archive.
>
> Thanks,
> Sebastian
>
> On 03/02/2017 11:01 AM, katta surendra babu wrote:
>> Hi Sebastian,
>>
>>
>>  I am looking  to work with  Json related website to crawl the data of that
>> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>>
>>
>>
>> Here the problem is :
>>
>>  for the 1st round I get the Json data into Hbase, but for second round  I
>> am not getting the meta data and the html links in nutch
>>
>>
>> So, please help me out if you  can ... to crawl the Json website completely.
>>
>>
>>
>> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> maybe the Hadoop Distributed Cache is what you are looking for?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>>> archived nutch job jar has a size of about 400M, every step will upload
>>> this archive and distribute to every work node. Is there away to upload
>>> only nutch jar, but leave depended libs on every work node?
>>>>
>>>
>>>
>>
>>
>

Loading...