Tuesday, February 18, 2014

Pass parameter to Hadoop Mapper

A typical example of word counts in hadoop, counts all keywords in file. let's assume that we have the following file contents:

Louai
Wael
Ahmed
Wael

If we run a typical example of word count "could be found in many websites"

/bin/hadoop jar wordcount /input.txt /output

The output of word counts will be
Louai 1
Ahmed 1
Wael 2

What if we would like to count only specific words in the input documents; for instance we would like to count only wael keyword. In order to accomplish this task we need to pass argument to hadoop mapper using -D command line, as in the following command line:

/bin/hadoop wordcount /input.txt /output -D parameter wael 

* First we need to add this code before the definition of the Job  :
 
//Set the configuration for the job
        Configuration conf = getConf();
        conf.set("parameter", args[4]);
        Job job = new Job(conf, "louai word count");

* Second we need to get  the passed parameter in the mapper :

String parameter = context.getConfiguration().get("parameter");

Well, let's see the code on github :https://github.umn.edu/alar0021/blogRepository/blob/master/wordCount/WordCount.java


No comments:

Post a Comment