Kernel Training engages real-time industry experienced faculty, project work, excellent lab facility, 24x7 customer support, quiz, and
Hadoop certification guidance
Course Curriculum:
Week 1
Module 1
1. Big Data and Hadoop Introduction
Goal set: You will get better exposure on Big Data and Hadoop. It will deal with important case uses, Hadoop Eco-system, and comparison between Hadoop and RDBMS.
Topics – Introduction to Big Data and Hadoop, Challenges of Big Data, Traditional approach Vs Hadoop, Hadoop Architecture, Distributed Model, Block structure File System, Technologies supporting Big Data, Replication, Fault Tolerance, Basic of Hadoop, Hadoop Eco-System, Use cases of Hadoop, Fundamental Design Principles of Hadoop, Comparison of Hadoop Vs RDBMS.
Module 2
2. Understand Hadoop Cluster Architecture & Map Reduce
Goal set: In this module you will be able to understand important aspects of the Hadoop Cluster Architecture and Map Reduce. Course participant can get thorough knowledge on log processing, Map Reduce, and even more.
Topics - Introduction – Hadoop Cluster & Architecture, 5 Daemons, Typical Workflow, Writing Files to HDFS, Reading Files from HDFS, Rack Awareness, Before Map Reduce, Map Reduce Overview, Word Count Problem, Word Count Flow & Solution, Map Reduce Flow, Log Processing and Map Reduce, Understand- Mapper, Reducer, Shuffling
Module 3
3. Advanced Map Reduce Concepts
Goal set: In this module you will be able to understand clearly advanced map reduce concepts and its significance.
Topics: Introduction – Combiner, Practitioner, Counter, Basics: Input Formats/Output Formats, Importance of Map Join using MR, Reduce Join using MR,MR Distributed Cache,
Week 2
Module 4
4. Planning for Cluster & Hadoop 2.0 Yarn
Goal set: In this module you can understand as how to configure the Hadoop, selection of Hadoop hardware, and software and even log files.
Topics: Introduction- Configuration of Hadoop, Understand – Choosing Right Hadoop Hardware, choosing Right Hadoop Software, Hadoop Log Files.
5. Hadoop 2.0 & YARN
Goal set: You can have better understanding of Yarn MR application flow by the end of this module.
Topics: Introduction – Hadoop 1.0 Challenges, NN Scalability, NN SPOF & HA, Job Tracker Challenges, Hadoop 2.0 New Features, Hadoop 2.0 Cluster Architecture & Federation, Hadoop 2.0 HA, Yarn & Hadoop Ecosystem, Yarn MR Application Flow.
Modules 5
Week 3
PIG
Goal set: In this module, the course will begin from introduction of Pig, to processing complex data, and joining and splitting Data Sets in detail.
Topics: Introduction: Pig, Pig’s Features & Pig Use Cases, Interacting with Pig, Basic Data Analysis with Pig, Pig Latin Syntax, Loading Data, Basics: Simple Data Types, Field Definitions, Data Output, Viewing the Schema, Filtering and Sorting Data, Commonly-Used Functions, Hands-On Exercise: Pig for ETL Processing, Processing Complex Data with Pig, Understand: Storage Formats, Complex/Nested Data Types, Grouping, Built-in Functions for Complex Data, Iterating Grouped Data, Hands-On Exercises, Multi-Dataset Operations with Pig, Techniques for Combining Data Sets, Joining Data Sets in Pig, Splitting Data Sets, Hands-On Exercise
Module 6
Hive
Goal set: In this module, you will understand fundamentals of Hive, comparison with traditional database, dropping tables, user defined functions and more about static partitioning, and dynamic partitioning.
Topics: Definition- HIVE, Fundamentals & Architecture, Loading and Querying Data in Hive, Hive Architecture and Installation, Comparisons- Traditional Database, Definition- HiveQL: Data Types, Operators and Functions, Hive Tables ,Managed Tables and External Tables, Partitions and Buckets: storage Formats, Importing Data, Altering Tables, Dropping Tables- Querying Data, Sorting and Aggregating, Map Reduce, Introduction- Scripts, Joins & Sub queries, Views, When to Use HIVE, Impala and Pig, Hands on Exercises, Integration, Data manipulation with Hive, Explain- User Defined Functions, Appending Data into existing Hive Table, Static partitioning vs dynamic partitioning
Module 7
HBASE
Goal set: In this module, you can learn about HBase, Data Model, operations, programming and practice hands on exercises.
Topics: Introduction – HBASE, CAP Theorem, HBase Architecture and concepts, Client API’s and their features, HBase tables The ZooKeeper Service, Understand- Data Model, Operations, Programming and Hands on Exercises.
Week 4
Module 8
SQOOP
Goal set: You can learn about the Introduction to Sqoop, and learn about importing data, exporting data using Squoop by the end of the module.
Topics: Introduction – Sqoop, MySQL Client & server, IN Detail- Connecting to relational data base using Sqoop, Importing data using Sqoop from Mysql, Exporting data using Sqoop to MySql, Incremental append, Importing data using Sqoop from Mysql to hive, Exporting data using Sqoop to MySql from hive, Importing data using Sqoop from Mysql to hbase, Using queries and sqoop.
Module 9
Flume & Oozie
Goal set: In this module you can understand about flume, twitter data analysis projects, oozie architecture, configurations, oozie properties, and gain hands on experience.
Topics: Introduction – Flume, Understand – Architecture, configurations, Master, collector, Agent, Twitter Data Analysis project Oozie, Introduction – Oozie, Architecture, configurations, Oozie Job Submission, Oozie properties, Hands on exercises
Week5
Module 10
Final Project in Banking Domain
Goal set: In this module, you can have an exposure on a project that deals with banking domain and also have a detailed discussion on the topic.
Topics- Introduction- Hadoop Project in Banking Domain, Objective, Problem Definition, Solution, Discuss data sets and specifications of the project.
Module 11
Fundamentals of Scala
Goal set: In this module you will deal with fundamentals of Scala in detail.
Topics: Introduction – Fundamentals of Scala in detail
Week6
Module 12
Apache Spark
Goal set: In this module you can understand fundamentals of spark, comparison between batch and real time big data analytics, and spark in memory data.
Topics- Introduction- Spark, Batch Vs. Real Time Big Data Analytics, Batch Analytics – Overview – Hadoop Ecosystem, Real Time Analytics Options, Streaming Data – Storm, In Memory Data – Spark.
Indeed Trend Hadoop Development Career Graph