Custom Writables in MapWritable

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Custom Writables in MapWritable

Kyle Renfro
Hadoop 0.22.0-RC0

I have the following reducer:
    public static class MergeRecords extends
Reducer<Text,MapWritable,Text,MapWritable>

The MapWritables that are handled by the reducer all have Text 'keys'
and contain different 'value' classes including Text, DoubleWritable,
and a custom Writable MapArrayWritable.  The reduce works as expected
if each MapWritable contains both a DoubleWritable and
MapArrayWritable.  The reduce fails with the following exception if
some of the MapWritables contains only a DoubleWritable value:

-----------
java.lang.IllegalArgumentException: Id 1 exists but maps to
com.realcomp.data.hadoop.record.MapArrayWritable and not
org.apache.hadoop.io.DoubleWritable at
org.apache.hadoop.io.AbstractMapWritable.addToMap(AbstractMapWritable.java:75)
at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:203)
at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:148)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:145)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:292)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) at
org.apache.hadoop.mapred.ReduceTask.
------------

Digging into the source a little I stumbled upon the fact that the
default constructor for AbstractMapWritable does not configure itself
to handle DoubleWritable as it does for all the other base Writables.
This looks like an omission to me, and If the DoubleWritable was
configured, I would probably never have noticed this problem, as there
would be only one custom class in the MapWritable.

Question 1:
Should I be able to reduce on MapWritables that contain different
(custom) value classes?

Question 2:
It appears the org.apache.hadoop.io.serialize.WritableSerialization
class reuses the first MapWritable instance for each deserialization.
This is probably a performance optimization, and explains why I am
getting the exception.  Is it possible
for me to register my own serialization class that would allow me to
deserialize MapWritables with different value classes?  Are there
examples of this available?


Note: I realize I am running off of a release candidate, but I thought
I would ask here first before I go through the trouble of upgrading
the cluster.

thanks,
Kyle