Tuesday, February 18, 2014

Pass parameter to Hadoop Mapper

A typical example of word counts in hadoop, counts all keywords in file. let's assume that we have the following file contents:

Louai
Wael
Ahmed
Wael

If we run a typical example of word count "could be found in many websites"

/bin/hadoop jar wordcount /input.txt /output

The output of word counts will be
Louai 1
Ahmed 1
Wael 2

What if we would like to count only specific words in the input documents; for instance we would like to count only wael keyword. In order to accomplish this task we need to pass argument to hadoop mapper using -D command line, as in the following command line:

/bin/hadoop wordcount /input.txt /output -D parameter wael 

* First we need to add this code before the definition of the Job  :
 
//Set the configuration for the job
        Configuration conf = getConf();
        conf.set("parameter", args[4]);
        Job job = new Job(conf, "louai word count");

* Second we need to get  the passed parameter in the mapper :

String parameter = context.getConfiguration().get("parameter");

Well, let's see the code on github :https://github.umn.edu/alar0021/blogRepository/blob/master/wordCount/WordCount.java


Wednesday, February 12, 2014

Setup JAVA_HOME Ubuntu

There are several ways to set-up JAVA_HOME : 

1) Temporary for current active session, open the terminal and enter the two command line:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:/$JAVA_HOME/bin




2) Permanent and for all users by changing either bashrc or profile files, open the terminal and enter the command line:


nano ~/.bashrc

nano /etc/profile

3) at the beginning of the opened file past the two line :

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:/$JAVA_HOME/bin

SSH without password

SSH Login without password:

Let's say that your machine is A and you want to access machine B without password, here are a few steps:

1- in Machine A write the following command line.

ssh-keygen -t rsa
 
2- Press enter to all prompt questions, no need to type anything. 
 
3- Copy the content of the following file 
 
less /home/A/.ssh/id_rsa.pub 
 
4- Access machine B with ssh and password 
 
ssh user@B....
 
5- Go to .ssh directory in machine B
 
cd .ssh
 
5- Create new file (machine B)
 
nano authorized_key
 
6- past the content of id_rsa.pub into that authorized_key file