This tutorial covers how to compile and run a MapReduce Version 2 (MRv2) job written in Java using Hadoop. All the code and some of the commands used in this tutorial are derived from the Map-Reduce Part 2 — Developing First MapReduce Job section from the Hadoop Tutorial: Developing Big-Data Applications with Apache Hadoop by coreservlets.com.

Note: I’m running Hadoop 2.3.0 in the following Cloudera QuickStart VM: a VMWare virtual machine of the Cloudera Distribution Including Apache Hadoop (CDH) version 5.0 last updated on 02 Apr 2014. I’m running this virtual machine in the following VMWare Player: VMware Player 6.0.2 build 1744117 released on 2014-04-17. My host machine is the first version of the Microsoft Surface Pro tablet running Windows 8.1 Pro (64-bit).

Prerequisites:

A properly configured installation of Hadoop 2.3.0 (including all dependencies) … good luck! ;-)

A 64-bit operating system, such as Windows 8.1 Pro (64-bit), that can run VMWare Player 6.0.2.
(free) VMWare Player 6.0.2 for 64-bit operating systems.
(free) A Cloudera QuickStart VM of CDH 5.0.

Outline:

Note: In order for you … and me (when I come back to this procedure) … to figure out where we are in the local file system, each step in the procedure starts with a call to cd.

Procedure:

(Optional) Format the Hadoop Distributed File System (HDFS).

This step doesn’t need to be done if we’re using the CDH 5.0 Cloudera QuickStart VM; however, I’ve included this step if we’re using our Hadoop installation for the first time.

Launch the Terminal and execute the following:
```
cd ~
hdfs namenode -format
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/our-username
```
where our-username is the username that we are currently logged in under. If we’re using the CDH 5.0 Cloudera QuickStart VM, our username is cloudera and we would execute the following:
```
cd ~
hdfs namenode -format
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/cloudera
```
If any of these folders already exist, we’ll receive the following type of error:
```
mkdir: `/user': File exists
mkdir: `/user/cloudera': File exists
```

Create the folder structure for this MapReduce project on the local file system.

We’ll need somewhere to put our input data files, our Java MapReduce code, and the eventual results (i.e. our output data files), so let’s create a folder named StartsWithCount for this project under the current user’s home directory:
```
cd ~
mkdir ~/StartsWithCount
```

Create the folder structure for this MapReduce project on the HDFS.

Similar to the step above, we’ll need a place to put our input data and the eventual results (i.e. our output data files) in the HDFS, so let’s create a folder named StartsWithCount and nest a folder inside called input … the output folder will be created by a later step:
```
cd ~/StartsWithCount
hdfs dfs -mkdir StartsWithCount
hdfs dfs -mkdir StartsWithCount/input
hdfs dfs -rm -r StartsWithCount/output
```
Note: Upon the successful completion of a MapReduce job, results are written to a file named part-r-00000. If that part-r-00000 file is still around the next time we run our MapReduce job, it won’t be overwritten … instead, our MapReduce job will fail … and we will be sad that we wasted our time. :-( Therefore, the last command above (hdfs dfs -rm -r StartsWithCount/output) recursively removes the StartsWithCount/output folder (if it exists) in the HDFS and everything in that output folder … including any files that might be named part-r-00000. ;-)

Download the data to the local file system.

Use wget to download the text file pg1524.txt from Project Gutenberg. pg1524.txt is Shakespeare’s Hamlet … and will be our input data file for the rest of this Hadoop tutorial. Since pg1524.txt isn’t very descriptive, let’s rename pg1524.txt to hamlet.txt:
```
cd ~/StartsWithCount
wget http://www.gutenberg.org/cache/epub/1524/pg1524.txt
mv pg1524.txt hamlet.txt
```

Copy the data from the local file system to the HDFS.

Upload (put) hamlet.txt to the HDFS. Since hamlet.txt is our input data file, we’ll put that text file in the StartsWithCount/input folder:
```
cd ~/StartsWithCount
hdfs dfs -put hamlet.txt StartsWithCount/input
```

Under Construction – Begin

Write the following *.java files for this MapReduce project on the local file system:

TODO:

cd ~/StartsWithCount
mkdir ~/StartsWithCount/mr
mkdir ~/StartsWithCount/mr/wordcount
cd ~/StartsWithCount/mr/wordcount
wget http://hadoop-course.googlecode.com/svn/trunk/HadoopSamples/src/main/java/mr/wordcount/StartsWithCountMapper.java
wget http://hadoop-course.googlecode.com/svn/trunk/HadoopSamples/src/main/java/mr/wordcount/StartsWithCountReducer.java
wget http://hadoop-course.googlecode.com/svn/trunk/HadoopSamples/src/main/java/mr/wordcount/StartsWithCountJob.java

Mapper:

package mr.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class StartsWithCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	private final static IntWritable countOne = new IntWritable(1);
	private final Text reusableText = new Text();
	@Override	
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		
		StringTokenizer tokenizer = new StringTokenizer(value.toString());
		while (tokenizer.hasMoreTokens()) {
			reusableText.set(tokenizer.nextToken().substring(0, 1));
			context.write(reusableText, countOne);
		}
	}
}

Reducer:

package mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.log4j.Logger;

public class StartsWithCountReducer extends
		Reducer<Text, IntWritable, Text, IntWritable> {
	Logger log = Logger.getLogger(StartsWithCountMapper.class);
	@Override
	protected void reduce(Text token, Iterable<IntWritable> counts,
			Context context) throws IOException, InterruptedException {
		int sum = 0;
		
		for (IntWritable count : counts) {
			sum+= count.get();
		}
		context.write(token, new IntWritable(sum));
	}
}

MapReduce Job:

package mr.wordcount;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class StartsWithCountJob extends Configured implements Tool{

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), "StartsWithCount");		
		job.setJarByClass(getClass());

		// configure output and input source
		TextInputFormat.addInputPath(job, new Path(args[0]));
		job.setInputFormatClass(TextInputFormat.class);
		
		// configure mapper and reducer
		job.setMapperClass(StartsWithCountMapper.class);
		job.setCombinerClass(StartsWithCountReducer.class);
		job.setReducerClass(StartsWithCountReducer.class);

		// configure output
		TextOutputFormat.setOutputPath(job, new Path(args[1]));
		job.setOutputFormatClass(TextOutputFormat.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		return job.waitForCompletion(true) ? 0 : 1;
	}
	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new StartsWithCountJob(), args);
		System.exit(exitCode);
	}
}

Compile *.java files to *.class files on the local file system.

TODO:

cd ~/StartsWithCount
javac -classpath `yarn classpath` mr/wordcount/*.java

Archive the *.class files to a *.jar on the local file system.

TODO:

cd ~/StartsWithCount
jar cvf HadoopSamples.jar mr/wordcount/*.class

Run this MapReduce project using the *.jar on the local file system and the data on the HDFS.

TODO:

cd ~/StartsWithCount
yarn jar HadoopSamples.jar mr.wordcount.StartsWithCountJob StartsWithCount/input StartsWithCount/output

Under Construction – End

Copy the results of this MapReduce project from the HDFS to the local file system.

Download (get) the resulting output file (part-r-00000) of our successful MapReduce job from the HDFS and display that downloaded file using cat:
```
cd ~/StartsWithCount
hdfs dfs -get StartsWithCount/output/part-r-00000
cat part-r-00000
```
Alternatively, we can keep the resulting output file where it’s at on the HDFS and display it from there using hdfs dfs -cat:
```
cd ~/StartsWithCount
hdfs dfs -cat StartsWithCount/output/part-r-00000
```

References:

How To Run A Java MapReduce Version 2 (MRv2) Job Using Hadoop

Under Construction – Begin

Under Construction – End

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112