Java 1.5?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Java 1.5?

Michael Bieniosek-2
I notice there don't seem to be any Java generics in Hadoop, even in places
where they would help (eg. constraining the parameter to
JobConf.setMapperClass).

Is there a policy against using Java 1.5 in Hadoop?

-Michael

Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

David Bowen-2
Michael Bieniosek wrote:
> I notice there don't seem to be any Java generics in Hadoop, even in places
> where they would help (eg. constraining the parameter to
> JobConf.setMapperClass).
>
> Is there a policy against using Java 1.5 in Hadoop?
>
> -Michael
>
>  
There used to be such a policy, but it is now OK to use generics.  Over
time it would be nice to upgrade old code to use them, making the code
more concise as well as more robust due to better type checking.


Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

tomwhite
I think we can do a lot to improve the use of generics, particularly
in MapReduce.

For example, we could change Mapper to be:

public interface Mapper<K1 extends WritableComparable, V1 extends Writable,
    K2 extends WritableComparable, V2 extends Writable> extends
JobConfigurable, Closeable {
  void map(K1 key, V1 value,
    OutputCollector<K2, V2> output, Reporter reporter)
    throws IOException;
}

and OutputCollector would be

public interface OutputCollector<K extends WritableComparable, V
extends Writable> {
  void collect(K key, V value) throws IOException;
}

Reducer would be changed similarly, although I'm not sure how we could
constrain the output types of the Mapper to be the input types of the
Reducer. Perhaps via the JobConf?

With these changes we could remove the need to specify key/value types
in JobConf (e.g. setOutputKeyClass) by using reflection on the Mapper
and Reducer. See http://joe.truemesh.com/blog//000495.html for how to
do this.

I wonder if we could improve WritableComparable too? With generics,
Comparable takes a type parameter, so

public interface WritableComparable<T> extends Writable, Comparable<T> {
}

and e.g.

public class IntWritable implements WritableComparable<IntWritable> {
  ...
  public int compareTo(IntWritable o) {
    ...
  }
}

with this change we would change Mapper and Reducer like so

public interface Mapper<K1 extends WritableComparable<K1>, V1 extends Writable,
    K2 extends WritableComparable<K2>, V2 extends Writable>

etc.

This needs more work - I've only tried to see if these things compile
- I haven't tried to run them. This paper has more on the types in
MapReduce: http://www.cs.vu.nl/~ralf/MapReduce/, and may have further
useful ideas.

Is anyone interested in exploring this further?

On 28/03/07, David Bowen <[hidden email]> wrote:

> Michael Bieniosek wrote:
> > I notice there don't seem to be any Java generics in Hadoop, even in places
> > where they would help (eg. constraining the parameter to
> > JobConf.setMapperClass).
> >
> > Is there a policy against using Java 1.5 in Hadoop?
> >
> > -Michael
> >
> >
> There used to be such a policy, but it is now OK to use generics.  Over
> time it would be nice to upgrade old code to use them, making the code
> more concise as well as more robust due to better type checking.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

Doug Cutting
Tom White wrote:

> I think we can do a lot to improve the use of generics, particularly
> in MapReduce.
>
> For example, we could change Mapper to be:
>
> public interface Mapper<K1 extends WritableComparable, V1 extends Writable,
>    K2 extends WritableComparable, V2 extends Writable> extends
> JobConfigurable, Closeable {
>  void map(K1 key, V1 value,
>    OutputCollector<K2, V2> output, Reporter reporter)
>    throws IOException;
> }

This looks promising.  Please note however that I've suggested Mapper
change in another, mostly orthogonal way, that map()'s parameters should
all be replaced with a single MapContext parameter.  This will greatly
facilitate evolution of the API.  Applications will not generally
implement this interface; only the MapReduce kernel will.  So new
methods can be added to the MapContext interface and old methods
deprecated much more easily and back-compatibly than parameters can be
added to or removed from map(), without changing user code.  Similarly,
we should replace reduce method parameters with a ReduceContext interface.

The context would have getKey(), getValue(), collect(), progress(), etc.
methods.  The genericization would be similar.  With luck, this could be
the last incompatible changes to these core APIs.

Do you think it makes sense to genericize prior to conversion to contexts?

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

Owen O'Malley-5
In reply to this post by tomwhite

On Apr 8, 2007, at 1:48 AM, Tom White wrote:

> I think we can do a lot to improve the use of generics, particularly
> in MapReduce.
> <... use generics in interfaces ...>

I like it. I was thrown off at first because classes aren't  
specialized based on their template parameters, but specialization of  
the parent class is available.

> Reducer would be changed similarly, although I'm not sure how we could
> constrain the output types of the Mapper to be the input types of the
> Reducer. Perhaps via the JobConf?

That is easy, actually. In the JobClient, we'd just check to see if  
the types all play well together. Basically, you need:
K1,V1 -> map -> K2, V2
K2, V2 -> combiner -> K2, V2 (if used)
K2, V2 -> reduce -> K3, V3

It will be a tricky bit of specification to decide exactly what the  
right semantics are, since even with the generics, the application  
isn't required to define them. Therefore, we have 5 places where we  
could find a value for K2 (config, mapper output, combiner input,  
combiner output, or reduce input). Clearly all classes must be  
checked for consistency once Hadoop decides what the right values are  
for each type.

The other piece that this interacts with is the desire to use context  
objects in the parameter list. However, they appear to be orthogonal  
to each other.

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

Owen O'Malley-5
I went ahead and created the placeholder for this discussion as:
http://issues.apache.org/jira/browse/HADOOP-1231
Reply | Threaded
Open this post in threaded view
|

Re: Java 1.5?

tomwhite
In reply to this post by Doug Cutting
The change to use contexts looks good.

> Do you think it makes sense to genericize prior to conversion to contexts?

Since the two changes are orthogonal I don't think it matters which
order they are done in. From the user's point of view, we should make
both changes in the same release so users only have to rewrite their
code once.