Input and Output Value Class Types

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Input and Output Value Class Types

Dennis Kubes
All,

Is there a way to get around having to have the input value class and
output value class be the same?  I have an object writable that I am
trying to unwrap.

Dennis
Reply | Threaded
Open this post in threaded view
|

Re: Input and Output Value Class Types

Stefan Groschupf-2
Hi,
may be have a look to the nutch indexer it use a kind of wrapper, may  
be this can help you.
Also please browse the haddop developer list archive since there was  
some related discussion.
HTH
Stefan
Am 29.06.2006 um 14:41 schrieb Dennis Kubes:

> All,
>
> Is there a way to get around having to have the input value class  
> and output value class be the same?  I have an object writable that  
> I am trying to unwrap.
>
> Dennis
>

Reply | Threaded
Open this post in threaded view
|

Re: Input and Output Value Class Types

Dennis Kubes
The indexer uses an ObjectWritable and I am using that trick.  Problem
is I need to input and ObjectWritable but output a different object.  I
will take a look at the hadoop list.

Dennis

Stefan Groschupf wrote:

> Hi,
> may be have a look to the nutch indexer it use a kind of wrapper, may
> be this can help you.
> Also please browse the haddop developer list archive since there was
> some related discussion.
> HTH
> Stefan
> Am 29.06.2006 um 14:41 schrieb Dennis Kubes:
>
>> All,
>>
>> Is there a way to get around having to have the input value class and
>> output value class be the same?  I have an object writable that I am
>> trying to unwrap.
>>
>> Dennis
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Input and Output Value Class Types

Stefan Groschupf-2
In worst case,( I do this sometime) you have to split your task in  
several different jobs.
Ugly but it works.
In general the problem is known, however if you put it again on the  
table in the hadoop developer list, it may be get some more priority.
Stefan

On 29.06.2006, at 21:09, Dennis Kubes wrote:

> The indexer uses an ObjectWritable and I am using that trick.  
> Problem is I need to input and ObjectWritable but output a  
> different object.  I will take a look at the hadoop list.
>
> Dennis
>
> Stefan Groschupf wrote:
>> Hi,
>> may be have a look to the nutch indexer it use a kind of wrapper,  
>> may be this can help you.
>> Also please browse the haddop developer list archive since there  
>> was some related discussion.
>> HTH
>> Stefan
>> Am 29.06.2006 um 14:41 schrieb Dennis Kubes:
>>
>>> All,
>>>
>>> Is there a way to get around having to have the input value class  
>>> and output value class be the same?  I have an object writable  
>>> that I am trying to unwrap.
>>>
>>> Dennis
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Input and Output Value Class Types

Andrzej Białecki-2
In reply to this post by Dennis Kubes
Dennis Kubes wrote:
> The indexer uses an ObjectWritable and I am using that trick.  Problem
> is I need to input and ObjectWritable but output a different object.  
> I will take a look at the hadoop list.

You can view ObjectWritable as an opaque container for any, well, Object
;). This means that you can produce Objects of whatever class (so long
as they implement Writable), stuff them into ObjectWritables, and then
write your own OutputFormat where you unpack them.

Check e.g. SegmentMerger to see how to do this - this is an extreme
case, because it not only produces different class types on output, but
also produces many output files.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

robots.txt

david.wojciechowski
In reply to this post by Stefan Groschupf-2
hi

i use nutch 0.7.1 to crawl a few intranetserver.
yesterday i tried to exclude some directories with the robots.txt.
but nothing changed.
i copied this robots.txt to the server:

User-agent: NutchCVS
Disallow: /cgi-bin/
Disallow: /manuals/

the User-agent "NutchCVS" and the robots agent name in nutch-default
is the same.

can anyone helps me with this problem?

i'm crawling with this command:

bin/nutch crawl urls -dir crawl060621 -depth 15 &> crawl060621.log &

greets david

==========================================================

David Wojciechowski
Universitätsklinikum Freiburg
Klinikrechenzentrum
Agnesenstrasse 6-8
D-79106 Freiburg

Telefon :  0761 / 270 - 1842
Fax: 0761 / 270 - 2276
E-Mail   :  [hidden email]

==========================================================