HADOOP ADMINSTRATION

HADOOP ADMINISTRATION

Overview:
Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment.

Hadoop is the most important framework for working with Big Data in a distributed environment. Due to the rapid deluge of Big Data and the need for real-time insights from huge volumes of data, the job of the Hadoop administrator is critical to large organizations. Hence there is huge demand for professionals with the right skills and certification.

Training Objectives of Hadoop Developer/Admin:
Hadoop Administration training provides participants and expertise in all the steps necessary to operate and maintain a Hadoop cluster, i.e. from Planning, Installation and Configuration through load balancing, Security and Tuning. The training will provide hands-on preparation for the real-world challenges faced by Hadoop Administrators. The course curriculum follows Apache Hadoop distribution.

This Hadoop Admin training course will help you understand the basic and advanced concepts of Big Data and all of the technologies related to the Hadoop stack and components of the Hadoop Ecosystem.

Target Students / Prerequisites
• Hadoop Developers, Admin and Architects
• IT managers, Support Engineers, QA professionals

Course Content

Hadoop Architecture

Introduction to
Parallel Computer vs. Distributed Computing
How to install Hadoop cluster on multiple
Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
DataNode architecture

MapReduce Architecture
Exploring JobTracker/TaskTracker
How a client submits a Map-Reduce job
Exploring Mapper/Reducer/Combiner
Shuffle: Sort & Partition
Input/output formats
Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the Apache MapReduce Web UI

Hadoop Developer Tasks
Writting a map-reduce programme
Reading and writing data using
Java Hadoop Eclipse integration
Mapper in details
Reducer in details
Using Combiners
Reducing Intermediate Data with Combiners
Writing Partitioners for Better Load
Balancing Sorting in HDFS
Searching in HDFS
Indexing in HDFS
Hands-On Exercise

Hadoop Administrative Tasks
Routine Administrative Procedures
Understanding dfsadmin and mradmin Block Scanner, Balancer
Health Check & Safe mode
DataNode commissioning/decommissioning
Monitoring and Debugging on a production
cluster NameNode Backup and Recovery
ACL (Access control list) Upgrading Hadoop

HBase Architecture
Introduction to Hbase
HBase vs. RDBMS
Exploring HBase Master & region server
Column Families and Regions
Basic Hbase shell commands.

Hive Architecture
Introduction to Hive
HBase vs. Hive
Installation of Hive
HQL (Hive query language)
Basic Hive commands

Pig Architecture
Introduction to Pig
Installation of Pig on your system
Basic Pig commands
Hands-On Exercise

Sqoop Architecture
Introduction to Sqoop
Installation of Sqoop on your system
Import/Export data from RDBMS to HDFS
Import/Export data from RDBMS to Hbase
Import/Export data from RDBMS to Hive
Hands-On Exercise

Mini Project / POC ( Proof of Concept )
Facebook-Hive POC
Usages of Hadoop/Hive @ Facebook
Static & dynamic partitioning
UDF ( User defined functions )