SAP & IT Training

SAP HADOOP COURSE CONTENT


    Introduction to Big Data and Hadoop


  • What is Big Data?
  • What are the challenges for processing big data?
  • What technologies support big data?
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Use cases of Hadoop
  • Hadoop ecosystem

  • SETTING UP HADOOP ENVIRONMENT


  • Standalone mode
  • Pseudo mode
  • Fully distributed mode
  • Ipv6
  • SSH
  • Installation of java, hadoop
  • Configurations of hadoop
  • Hadoop Processes ( NN, SNN, JT, DN, TT)
  • Hadoop Web Interfaces
  • Common errors when running hadoop cluster, solutions

  • HADOOP PROCESSES


  • Name node
  • Secondary name node
  • Job tracker
  • Task tracker
  • Data node

  • HDFS


    HDFS Overview and Architecture
  • Configuring HDFS
  • Interacting With HDFS
  • Additional HDFS Tasks
  • Hadoop File System Shell
  • File System Java API
  • Typical workflow
  • Writing files to HDFS
  • Reading files from HDFS
  • Replication
  • Rack awareness
  • HDFS Federation
  • Scaling HDFS
  • Performance Tuning
  • HDFS Cluster Administration

  • Let’s talk Map Reduce


  • Before Map Reduce
  • Map Reduce overview
  • Map Reduce Components
  • Map Reduce Architecture
  • Map Reduce Internals

  • How Map Reduce Works


  • Anatomy of Map Reduce job run
  • Job submission
  • Job initialization
  • Task assignment
  • Job completion
  • Job scheduling
  • Job failures
  • Shuffle and sort

  • Developing the Map Reduce Application


  • Writing Map Reduce Programs
  • Map Reduce API’s (Old & New)
  • Data Types
  • Explain the Driver, Mapper and Reducer code
  • Configuring development environment – Eclipse
  • Running on cluster
  • Hands on exercises

  • Map Reduce Formats


  • Input Formats – Input splits & records, text input, binary input, multiple inputs & database input
  • Output Formats – text output, binary output, multiple outputs, lazy output and database output
  • Hands on exercises

  • Map Reduce Features


  • Counters
  • Sorting
  • Joins – Map side and reduce side
  • Side data distribution
  • Map Reduce combiner
  • Map Reduce partitioner
  • Map Reduce distributed cache
  • Speculative Execution
  • Hands on exercises
  • Map Reduce Administration
  • YARN
  • Performance Tuning

  • Pig


  • Pig Overview
  • Installation
  • Modes of Execution
  • Pig Latin
  • Pig with HDFS
  • Creating Tables
  • Loading and Manipulating Tables Data
  • Data Analysis using pig Latin
  • Pig UDF’s

  • Hive


  • Hive Overview
  • Installation
  • Creating and Maintaining Hive Data Warehouse
  • Hive QL
  • Hive Data Analysis
  • Hive UDF
  • Buckets
  • Partitions
  • Hive Meta Store
  • Hive Databases
  • Performance Tuning

  • HBase


  • HBase Overview and Architecture
  • HBase Installation
  • HBase cluster configuration
  • HBase Shell
  • CRUD operations
  • Scanning and Batching
  • Filters
  • HBase Key Design
  • Hmaster, Zookeeper, Region Servers, Regions

  • Sqoop


  • Sqoop Overview
  • Installation
  • Imports and Exports

  • Working with Flume


  • Introduction.
  • Configuration and Setup.
  • Flume Source with example.
  • Complex flume architecture.

  • CONFIGURATION


  • Basic Setup
  • Important Directories
  • Selecting Machines
  • Cluster Configurations
  • Large Clusters: Multiple Racks

  • Integrations


  • Distributed installations
  • Best Practices.
  • Linux Basics
  • Java Basics
Quick Enquiry

Name:   

E-mail:  

Phone:  

Country:

Course: