How To Run A Java MapReduce Version 1 (MRv1) Job Using Hadoop On Windows

This tutorial covers how to compile and run the MaxTemperature example covered in Chapter 2 (MapReduce) of Hadoop: The Definitive Guide, 3rd Edition, using the Microsoft HDInsight Emulator for Windows Azure (Hadoop on Windows).

Prerequisites:

A supported Windows operating system:
- Windows 8,
- Windows 7,
- Windows Vista SP2,
- Windows XP SP3+,
- Windows Server 2003 SP2+,
- Windows Server 2008,
- Windows Server 2008 R2, or
- Windows Server 2012
An Internet connection.
Administrator privileges.
7-Zip or gzip -d … or some other way of decompressing *.gz files.

Outline:

Procedure:

Install the Microsoft HDInsight Emulator for Windows Azure (Hadoop on Windows).

Follow the instructions at http://azure.microsoft.com/en-us/documentation/articles/hdinsight-get-started-emulator/#install to install Apache Hadoop in a single-node cluster deployment using the Hortonworks Data Platform (HDP) for Windows.

Note: The Microsoft HDInsight Emulator will be installed using the Microsoft Web Platform Installer (Web PI) launched from http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT.

After installing, you will have the following:
- Apache Hadoop 1.0.3
- Apache Pig 0.9.3
- Apache HCatalog 0.4.1
- Apache Templeton 0.1.4
- Apache Hive 0.9.0
- Apache Sqoop 1.4.2
- Apache Oozie 3.2.0
These versions are a few years old; however, they’re good enough to get Hadoop up and running on Windows with minimal effort.

The Microsoft HDInsight Emulator for Windows Azure makes the following system modifications:
- Installs the following:
  - Python 2.7.3 (32-bit)
  - Hortonworks Data Platform for Windows
  - Microsoft HDInsight Emulator for Windows Azure
- Creates a new Local Group named “HadoopUsers”
- Creates a new Local User named “hadoop” that is a member of the following Local Groups: HadoopUsers and HomeUsers
- Creates the following services and automatically starts these services under the context of the new Local User hadoop:
  - Apache Hadoop datanode
  - Apache Hadoop derbyserver
  - Apache Hadoop historyserver
  - Apache Hadoop hiveserver
  - Apache Hadoop hiveserver2
  - Apache Hadoop hwi
  - Apache Hadoop jobtracker
  - Apache Hadoop metastore
  - Apache Hadoop namenode
  - Apache Hadoop oozieservice
  - Apache Hadoop secondarynamenode
  - Apache Hadoop tasktracker
  - Apache Hadoop templeton
References:
- http://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/#hdinsight-versions
- http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1.0/bk_releasenotes_HDP-Win/content/ch_relnotes-hdp-win-1.1.1_1.html

(Optional) Format the Hadoop Distributed File System (HDFS).

This step doesn’t need to be done if you have installed Microsoft HDInsight Emulator for Windows Azure; however, I’ve included this step if you’re attempting to reuse these instructions for Linux … as I do.

Launch the Hadoop Command Line shortcut and execute the following:
```
hadoop namenode -format
hadoop fs -mkdir /user
hadoop fs -mkdir /user/your-username
```
If any of these folders already exist, then you will receive the following type of error:
```
mkdir: cannot create directory /user: File exists
```
Create the folder structure for this MapReduce project on the local file system.

Create a folder on the C drive called “Temp” ( i.e. C:\Temp\ ).

Inside this folder, create another folder called “MaxTemp” ( i.e. C:\Temp\MaxTemp\ ).
Create the folder structure for this MapReduce project on the HDFS.

Using the Hadoop Command Line, execute the following:
```
hadoop fs -mkdir MaxTemp
hadoop fs -mkdir MaxTemp/input
```
In HDFS, this will make a folder named “MaxTemp” under the /user/your-username/ folder ( i.e. /user/your-username/MaxTemp/ ). Then, it will make a folder named “input” under the /user/your-username/MaxTemp/ folder ( i.e. /user/your-username/MaxTemp/input/ ).
Download the data to the local file system.

In C:\Temp\MaxTemp\, download the following files:
- https://github.com/tomwhite/hadoop-book/raw/3e/input/ncdc/all/1901.gz
- https://github.com/tomwhite/hadoop-book/raw/3e/input/ncdc/all/1902.gz
Extract each *.gz file to C:\Temp\MaxTemp\ so that you have the following:
- C:\Temp\MaxTemp\1901
- C:\Temp\MaxTemp\1902
If you’re using 7-Zip, choose “Extract Here” from the 7-Zip contextual menu on each *.gz file.

Copy the data from the local file system to the HDFS.

Using the Hadoop Command Line, execute the following:

hadoop fs -copyFromLocal C:\Temp\MaxTemp\1901 MaxTemp/input
hadoop fs -copyFromLocal C:\Temp\MaxTemp\1902 MaxTemp/input

Write the following *.java files for this MapReduce project on the local file system:

In the folder C:\Temp\MaxTemp\, write the following into a file named MaxTemperatureMapper.java ( i.e. C:\Temp\MaxTemp\MaxTemperatureMapper.java ):

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<
						/* input key type:    */	LongWritable,
						/* input value type:  */	Text,
						/* output key type:   */	Text,
						/* output value type: */	IntWritable
						> {
	
	private static final int MISSING = 9999;
	
	@Override
	public void map ( /* input key: */ LongWritable key, /* input value: */ Text value, Context context ) throws IOException, InterruptedException {
		String line = value.toString();
		String year = line.substring( 15, 19 );
		int airTemperature;
		if ( line.charAt( 87 ) == '+' ) { // parseInt doesn't like leading plus signs
			airTemperature = Integer.parseInt( line.substring( 88, 92 ) );
		} else {
			airTemperature = Integer.parseInt( line.substring( 87, 92 ) );
		}
		String quality = line.substring( 92, 93 );
		if ( airTemperature != MISSING && quality.matches( "[01459]" ) ) {
			context.write(
				/* output key:   */	new Text( year ),
				/* output value: */	new IntWritable( airTemperature )
			);
		}
	}
	
}

In the folder C:\Temp\MaxTemp\, write the following into a file named MaxTemperatureReducer.java ( i.e. C:\Temp\MaxTemp\MaxTemperatureReducer.java ):

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<
						/* input key type:    */	Text,
						/* input value type:  */	IntWritable,
						/* output key type:   */	Text,
						/* output value type: */	IntWritable
						> {
	
	@Override
	public void reduce ( /* input key: */ Text key, /* input value type:  */ Iterable< IntWritable > values, Context context ) throws IOException, InterruptedException {
		int maxValue = Integer.MIN_VALUE;
		for ( IntWritable value : values ) {
			maxValue = Math.max( maxValue, value.get() );
		}
		context.write(
			/* output key:   */	key,
			/* output value: */	new IntWritable( maxValue )
		);
	}
	
}

In the folder C:\Temp\MaxTemp\, write the following into a file named MaxTemperature.java ( i.e. C:\Temp\MaxTemp\MaxTemperature.java ):

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {
	
	public static void main ( String[] args ) throws Exception {
		
		if ( args.length != 2 ) {
			System.err.println( "Usage: MaxTemperature <input path> <output path>" );
			System.exit( -1 );
		}
		
		Job job = new Job();
		job.setJarByClass( MaxTemperature.class );
		job.setJobName( "Max temperature" );
		
		FileInputFormat.addInputPath( job, new Path( args[ 0 ] ) );
		FileOutputFormat.setOutputPath( job, new Path( args[ 1 ] ) );
		
		job.setMapperClass( MaxTemperatureMapper.class );
		job.setReducerClass(MaxTemperatureReducer.class);
		
		job.setOutputKeyClass( Text.class );
		job.setOutputValueClass( IntWritable.class );
		
		System.exit( job.waitForCompletion( true ) ? 0 : 1 );
		
	}
	
}

Compile *.java files to *.class files on the local file system.

The Microsoft HDInsight Emulator installs an older version of the JDK (i.e. 1.6.0_31) in the folder C:\Hadoop\java\ … so we’ll use this JDK to compile the *.java files.

Using the Hadoop Command Line, execute the following:
```
cd C:\Temp\MaxTemp
C:\Hadoop\java\bin\javac.exe -classpath C:\Hadoop\hadoop-1.1.0-SNAPSHOT\hadoop-core-1.1.0-SNAPSHOT.jar;C:\Hadoop\hadoop-1.1.0-SNAPSHOT\lib\commons-cli-1.2.jar -d C:\Temp\MaxTemp MaxTemperatureMapper.java MaxTemperatureReducer.java MaxTemperature.java
```

Archive the *.class files to a *.jar on the local file system.

As mentioned above, the Microsoft HDInsight Emulator installs an older version of the JDK (i.e. 1.6.0_31) in the folder C:\Hadoop\java\ … so we’ll use this JDK to archive the *.class files.

Using the Hadoop Command Line, execute the following:
```
cd C:\Temp\MaxTemp
C:\Hadoop\java\bin\jar.exe cvf MaxTemperature.jar -C C:\Temp\MaxTemp MaxTemperatureMapper.class MaxTemperatureReducer.class MaxTemperature.class
```

Run this MapReduce project using the *.jar on the local file system and the data on the HDFS.

Using the Hadoop Command Line, execute the following:
```
hadoop jar C:\Temp\MaxTemp\MaxTemperature.jar MaxTemperature MaxTemp/input MaxTemp/output
```

Copy the results of this MapReduce project from the HDFS to the local file system.

Using the Hadoop Command Line, execute the following:
```
hadoop fs -copyToLocal MaxTemp/output/part-r-00000 C:\Temp\MaxTemp
```
Then, open part-r-00000 in WordPad or Notepad to see the results.

The results should be:
```
1901	317
1902	244
```
Remember that WordPad will interpret “\n” as a new line … whereas, Notepad will not. Notepad only interprets “\r\n” as a new line. Therefore, if you’re set on opening up the part-r-00000 file in Notepad, you can first open the file in WordPad, save & close the file, and then open the file in Notepad.

References:

How To Run A Java MapReduce Version 1 (MRv1) Job Using Hadoop On Windows

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112