Wednesday, August 22, 2012

ArrayWritable as Reduce input

1, As the input for a Reducer

ArrayWritable is a class, but you have  to create a subclass indicate its proper type if you wanna use it in Map/Reduce tasks.
(http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.0.1/api/index.html)
With the  code slice below, you can use it as a Mapper's output value(Reducer's input value)

public static class TextArrayWritable extends ArrayWritable {
    public TextArrayWritable() {
        super(Text.class);
    }
}

2, As the input Key for a Reducer

Well, to be a Key for Reducer, a class should be comparable, because the output from a Mapper should be sorted before it become the input of following Reducer.
We have 2 solution to make the ArrayWritable  subclass to be comparable.

1, use setOutputKeyComparatorClass in JobConf. (old style API)
2, add interface WritableComparable to existing TextArrayWritable.
 not good at Java code , my apologies :(

 1 public  class TextArrayWritable extends ArrayWritable implements WritableComparable<TextArrayWritable>{
 2     public TextArrayWritable() {
 3         super(Text.class);
 4     }
 5 
 6     public TextArrayWritable(Text[] values) {
 7         super(Text.class, values);
 8     }
 9 
10     @Override
11     public int compareTo(TextArrayWritable o) {
12         try{
13             Writable[] self = this.get();
14             Writable[] other = o.get();
15 
16             if (self == null) self = new Text[]{};           
17             if (other == null) other = new Text[]{};         
18 
19             if (self.length == other.length){
20                 for (int i = 0; i < self.length; i++){                   
21                     int r = ((Text)self[i]).compareTo(((Text)other[i]));
22                     if (r != 0return r;                    
23                 }                    
24             }
25             else{
26                 return (self.length < other.length) ? -1 : 1;
27             }
28         }
29         catch(Exception e){
30             e.printStackTrace();
31         }
32         return 0;
33     }
34 }

5 comments:

  1. There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.

    Best hadoop training institute in chennai
    Hadoop Course in Chennai
    Hadoop training institutes in chennai

    ReplyDelete
  2. Join FITA has offered quality Python Training in Chennai at affordable price with help of highly skilled faculties having more than 7 years of experience in Python, for more details about this advanced python training call at 91 98403-76887.
    Regards,
    Python Training in Chennai|Python Taining|Python Training Institutes in Chennai

    ReplyDelete
  3. Thanks for sharing this niche useful informative post to our knowledge, Actually SAP is ERP software that can be used in many companies for their day to day business activities it has great scope in future.
    Regards,
    SAP course in chennai|
    SAP Training in Chennai|SAP Training Chennai

    ReplyDelete
  4. I was just wondering how I missed this article so far, this is a great piece of content I have ever seen in the entire Internet. Thanks for sharing this worth able information in here and do keep blogging like this.

    Hadoop Training Chennai | Big Data Training in Chennai | Big Data Training Chennai

    ReplyDelete
  5. Informative post about hadoop, i am looking forward for realtime hadoop online training institute.

    ReplyDelete