A typical example of word counts in hadoop, counts all keywords in file. let's assume that we have the following file contents:
Louai
Wael
Ahmed
Wael
If we run a typical example of word count "could be found in many websites"
/bin/hadoop jar wordcount /input.txt /output
The output of word counts will be
Louai 1
Ahmed 1
Wael 2
What if we would like to count only specific words in the input documents; for instance we would like to count only wael keyword. In order to accomplish this task we need to pass argument to hadoop mapper using -D command line, as in the following command line:
/bin/hadoop wordcount /input.txt /output -D parameter wael
* First we need to add this code before the definition of the Job :
* Second we need to get the passed parameter in the mapper :
Well, let's see the code on github :https://github.umn.edu/alar0021/blogRepository/blob/master/wordCount/WordCount.java
Louai
Wael
Ahmed
Wael
If we run a typical example of word count "could be found in many websites"
/bin/hadoop jar wordcount /input.txt /output
The output of word counts will be
Louai 1
Ahmed 1
Wael 2
What if we would like to count only specific words in the input documents; for instance we would like to count only wael keyword. In order to accomplish this task we need to pass argument to hadoop mapper using -D command line, as in the following command line:
/bin/hadoop wordcount /input.txt /output -D parameter wael
* First we need to add this code before the definition of the Job :
//Set the configuration for the job
Configuration conf = getConf();
conf.set("parameter", args[4]);
Job job = new Job(conf, "louai word count");
* Second we need to get the passed parameter in the mapper :
String parameter = context.getConfiguration().get("parameter");
Well, let's see the code on github :https://github.umn.edu/alar0021/blogRepository/blob/master/wordCount/WordCount.java