Monday, December 26, 2011

Word Count

Programming Lab 1:  Word Count
  1. We will learn how to setup development environment for Hadoop projects
  2. Run “Word Count ” applications
  3. Create new application to count letters in text documents
Prerequisites
  1. Java 1.6
  2. Hadoop and log4j libraries
  3. NetBeans or Eclipse
Create Project and Link with Libraries
  1. Copy provided libraries and java code from USB drives
  2. Create project in NetBeans or Eclipse (specific instructions on the next page)
  3. Link with libraries
  4. Create new class and copy provided code
  5. Modify input and output directory
  6. Run code and examine result
Create Project with NetBeans
  1. Click on File -> New Project
  2. Select Java Application type
  3. Set main class to WordCount
  4. Name your project HadoopLab1WordCount
  5. Follow instructions on the screen
  6. See the next page for a screenshot
  1. Open a file from the provided material in ProgLabs/lab1/original
  2. Copy everything except the package name
  3. Insert code into newly created class right after the package name
Let’s link with appropriate libraries
  1. Right click on project name and select properties
  2. Pick Libraries option and click on the Compile tab
  3. Click on Add JAR/Folder button
  4. Add everything from ProgLabs/lib folder
  5. NetBeans will re-evaluate dependencies and you should not see any errors at this point
  6. Adjust input/output values in run method to ProgLabs/lab1/<<input|output>> accordingly  
 
Link the project with appropriate libraries
  1. Right click on project name and select properties
  2. Pick Libraries option and click on Compile tab
  3. Click on Add JAR/Folder button
  4. Add everything from ProgLab/lib folder
  5. NetBeans will reevaluate dependencies and you should not see any errors at this point
  1. Adjust the input/output values in run method to ProgLabs/lab1/<<input|output>> accordingly
  2. Right-click and run!
Create Project with Eclipse
  1. Create New Project: Click on File -> New Project
  2. Name project HadoopLab1WordCount
  3. Select Java 1.6
  4. Click Next and add libraries from ProgLabs/lib
  1. Create a new class – WordCount
  2. Right-click on the project -> select New -> Class
  3. Copy code from Labs/ProgLabs/lab4/original into new class
  4. Right click and run it!
Lets examine output directory
You will see two files: one indicating result of the Map Reduce job and the second one containing result of the job  
Result
Adam        1
Brandon        1
Graig        1
Kim        1
Marty        1
Mike        1
Nancy        1
Nick        1
Nishani        1
Steve        2
Tracy        2
Vidur        1
Exercise: Let’s Count Letters!
  1. Modify word count application to count letters in the document
  2. Create another class that implements Reducer and switch application to use it in run method
  3. Hint:  
                String ch = String.valueOf(line.charAt(i));

No comments:

Post a Comment