Quantcast

how to sort the output by value in reduce instead of by key?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

how to sort the output by value in reduce instead of by key?

leibnitz
yes,my key is ip,and value is a object(which inherited hadoop Record class,and will be converted
a visualized data),e.g.:
key                   field1,field2,field3(these are properties belong to object)
12.121.23.121 121,11,/img/dd.jpg
32.121.23.222 221,11,/img/xx.jpg

1.i want to sort by field1 ,but it is sorted by key in reduce by default,how to do?
2.by the way,when my value(object) inherited to the Record,why the output sequence are:
data1
<empty line>
data2
<empty line>
...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to sort the output by value in reduce instead of by key?

leibnitz
can anyone get me a tips ?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to sort the output by vlaue in reduce instead of by key?

sumit ghosh
In reply to this post by leibnitz
Your field1 data can be split over multiple reducers. Is it possible to emit
field1 as the key from the reducer (in case you do not need the ip anymore)?




________________________________
From: leibnitz <[hidden email]>
To: [hidden email]
Sent: Mon, 11 April, 2011 12:02:46 PM
Subject: how to sort the output by vlaue in reduce instead of by key?

yes,my key is ip,and value is a object(which inherited hadoop Record
class,and will be converted
a visualized data),e.g.:
key                   field1,field2,field3(these are properties belong to
object)
12.121.23.121 121,11,/img/dd.jpg
32.121.23.222 221,11,/img/xx.jpg

1.i want to sort by field1 ,but it is sorted by key in reduce by default,how
to do?
2.by the way,when my value(object) inherited to the Record,why the output
sequence are:
data1

data2
...


--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-sort-the-output-by-vlaue-in-reduce-instead-of-by-key-tp2805541p2805541.html

Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

how to sort the output by vlaue in reduce instead of by key?

sumit ghosh
In reply to this post by leibnitz
Your field1 data can be split over multiple reducers. Is it possible to emit
field1 as the key from the reducer (in case you do not need the ip anymore)?




________________________________
From: leibnitz <[hidden email]>
To: [hidden email]
Sent: Mon, 11 April, 2011 12:02:46 PM
Subject: how to sort the output by vlaue in reduce instead of by  key?

yes,my key is ip,and value is a object(which inherited hadoop Record
class,and will be converted
a visualized data),e.g.:
key                   field1,field2,field3(these are properties belong to
object)
12.121.23.121 121,11,/img/dd.jpg
32.121.23.222 221,11,/img/xx.jpg

1.i want to sort by field1 ,but it is sorted by key in reduce by default,how
to do?
2.by the way,when my value(object) inherited to the Record,why the output
sequence are:
data1

data2
...


--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-sort-the-output-by-vlaue-in-reduce-instead-of-by-key-tp2805541p2805541.html

Sent from the Hadoop lucene-users mailing list archive at  Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to sort the output by value in reduce instead of by key?

Josh Patterson
In reply to this post by leibnitz
Leibnitz,
I think you are looking for "secondary sort" in this case where the
data arrives in some sort of order at the reducer as opposed to "in a
group by key". Is that the case?

For a look at secondary sort I've got a few blog articles:

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

and part 3 includes source code on github.com:

https://github.com/jpatanooga/Caduceus

Hope that helps,

Josh



On Mon, Apr 11, 2011 at 5:26 AM, leibnitz <[hidden email]> wrote:
> can anyone get me a tips ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/how-to-sort-the-output-by-value-in-reduce-instead-of-by-key-tp2805541p2805922.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



--
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to sort the output by value in reduce instead of by key?

leibnitz
thanks all.
to : Josh,i think you are right.i have previously  tried to use a group key by field1+ip at reduce.but it is failed(not sort).
i will check your point:)
Loading...