Big Data Architect Training

 >>  Big Data Architect Training

Big Data Architect Training


 (4.9) | 750 Ratings


Introduction


Big Data Architect Training Details
Track Regular Track Weekend Track Fast Track
Course Duration 55 Hrs 8 Weekends 5 Days
Hours 1hr/day 2 Hours a day 6 Hours a day
Training Mode Online Classroom Online Classroom Online Classroom
Delivery Instructor Led-Live Instructor Led-Live Instructor Led-Live


Course Curriculum

Course Objectives:

  • Java Training
  • Scala Programming
  • Big Data Hadoop Developer
  • Spark Development
  • Mongodb

JAVA TRAINING

Core Java Contents:

  • Features of Java
  • Java Basics
  • Classes and Objects
  • Garbage Collection
  • Java Arrays
  • Referring Java Documentation
  • Wrapper classes
  • Inheritance
  • Polymorphism
  • Abstract Classes
  • Interfaces
  • Packages
  • Introduction to Exception Handling
  • Checked/Unchecked Exceptions
  • Using try, catch, finally, throw, throws
  • Exception Propagation
  • Pre-defined Exceptions
  • User Defined Exceptions
  • Overview of Java IO Package
  • Byte streams
  • Character streams
  • Object serialization & Object Externalization
  • Introduction to GUI Programming (swing )
  • Introduction to multithreading
  • Thread life cycle
  • Thread priorities
  • Using wait() & notify()
  • DeadLocks
  • JDBC Architecture
  • Using JDBCI API
  • Transaction Management

Course Contents – Servlets and JSP

 Java Servlet Technology

  • What Is  a Servlet?
  • Servlet Life  Cycle
  • Initializing a Servlet
  • Writing Service Methods
  • Getting Information from Requests
  • Constructing Responses
  • ServletContext and ServletConfig Parameters
  • Attributes- Context, Request and Session
  • Maintaining Client State – Cookies/Url rewriting/Hidden Form Fields
  • Session Management
  • Servlet Communication – include, forward, redirect
  • WEB-INF and the Deployment Descriptor

Java Server Pages Technology

  • What Is a JSP Page?
  • The Life Cycle of a JSP Page
  • Execution of a JSP page
  • Different Types of tags(directive, standard actions, bean tags, expressions, declarative)
  • Creating Static Content
  • Creating Dynamic Content
  • Using Implicit Objects within JSP Pages
  • JSP Scripting Elements
  • Including Content in a JSP Page
  • Transferring Control to Another Web Component – communication with servlet
  • Param Element
  • JavaBeans Component Design Conventions
  • Why Use a JavaBeans Component?
  • Creating and Using a JavaBeans Component
  • Setting JavaBeans Component Properties
  • Retrieving JavaBeans Component Properties
  • Custom tags

SCALA PROGRAMMING

Introduction to Scala

  • A brief history of the Java platform to date
  • Distinguishing between the Java language and platform
  • Pain points when using Java for software development
  • Possible criteria for an improved version of Java
  • How and why the Scala language was created

Key Features of the Scala Language

  • Everything is an object
  • Class declarations
  • Data typing
  • Operators and methods
  • Pattern matching
  • Functions
  • Anonymous and nested functions
  • Traits

Basic Programming in Scala

  • Built in types, literals and operators
  • Testing for equality of state and reference
  • Conditionals, simple matching and external iteration
  • Working with lists, arrays, sets and maps
  • Throwing and catching exceptions
  • Adding annotations to your code
  • Using standard Java libraries
  • Using Scala with in java application and vice-versa

OO Development in Scala

  • A minimal class declaration
  • Understanding primary constructors
  • Specifying alternative constructors
  • Declaring and overriding methods
  • Creating base classes and class hierarchies
  • Creating traits and mixing them into classes
  • How a Scala inheritance tree is linearized

Functional Programming in Scala

  • Advanced uses of for expressions
  • Understanding function values and closures
  • Using closures to create internal iterators
  • Creating and using higher order functions
  • Practical examples of higher order functions
  • Currying and partially applied functions
  • Creating your own Domain Specific Languages(DSL’s)
  • Exception handling in Scala
  • Try catch with case

Pattern Matching in Depth

  • Using the match keyword to return a value
  • Using case classes for pattern matching
  • Adding pattern guards to match conditions
  • Partially specifying matches with wildcards
  • Deep matching using case constructors
  • Matching against collections of items
  • Using extractors instead of case classes
  • Test Driven Development in Scala

Writing standard JUnit tests in Scala

  • Conventional TDD using the ScalaTest tool
  • Behavior Driven Development using ScalaTest
  • Using functional concepts in TDD
  • XML Manipulating in Scala
  • Using Scala to read and write xml using different parsers (Dom, Sax)
  • Working with XML literals in code
  • Embedding XPath like expressions
  • Using Pattern Matching to process XML data
  • Serializing and deserializing to and from XML
  • Scala with database transactiono

Writing Concurrent Apps

  • Issues with conventional approaches to multi-threading
  • How an actor-based approach helps you write thread-safe code
  • The Scala architecture for creating actor-based systems
  • Different coding styles supported by the actor model

Scala web

  • Scala with JAXB
  • Scala to call/consume a REST/SOAP service
  • Scala with logging information
  • Using Scala in web application (JSP, Servlet)
  • Conclusion

 Introduction

  • Introduction
  • Module Outline
  • What We Will Build
  • History of Play!
  • Philosophy
  • Technologies
  • Summary

Starting Up

  • Introduction
  • Downloading Play!
  • The Play Command
  • Compiling and Hot Deploy
  • Testing
  • IDE’s
  • Project Structure
  • Configuration
  • Error Handling
  • Summary

Routing

  • Introduction
  • The Router
  • Router Mechanics
  • Routing Rules
  • Play! Routes
  • Play! Routes: HTTP Verbs
  • Play! Routes: The Path
  • Play! Routes: The Action Call
  • Routing in Action
  • Summary

Controllers, Actions, and Results

  • Introduction
  • Controllers
  • Actions
  • Results
  • Session and Flash Scope
  • Request Object
  • Implementing the Contacts Stub Controller
  • Summary

 Views

  • Introduction
  • Play! Views
  • Static Views
  • Passing Arguments
  • Iteration
  • Conditionals
  • Partials and Layouts
  • Accessing the Session Object
  • The Asset Route
  • Summary

 Data Access

  • Introduction
  • Agnostic Data Access
  • The Domain Model
  • Evolutions
  • Finder and Listing Contacts
  • The Form Object and Adding a Contact
  • Editing a Contact
  • Deleting a Contact
  • Review
  • Summary

The Global Object

  • Introduction
  • The Global Object
  • Global Object Methods
  • onStart
  • onHandlerNotFound
  • Summary

BIG DATA HADOOP DEVELOPER

Introduction to  Linux  and  Big  Data  Virtual  Machine (VM)

  • Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux – Why Linux? – Windows and the Linux equivalents – Different flavors of Linux – Unity Shell (Ubuntu UI) – Basic Linux Com m ands (enough to get started with Hadoop).

Understanding Big Data

  • 3V (Volume- Variety- Velocity) characteristics
  • Structured and Unstructured Data
  • Application and use cases of Big Data Limitations of traditional large Scale systems

How a distributed way of computing is superior (cost and scale) Opportunities and challenges with Big Data

HDFS (The Hadoop Distributed File System)

HDFS Overview and Architecture

    • Deployment Architecture
    • Nam e Node, Data Node and Checkpoint Node ( aka Secondary Name Node)
    • Safe mode
    • Configuration files
    • HDFS Data Flows ( Read v/s Write)

How   HDFS addresses fault tolerance?

    • CRC Check Sum
    • Data replication
    • Rack awareness and  Block  placement policy
    • Small files problem

HDFS Interfaces

  • Command Line Interface
  • File System
  • Administrative
  • Web Interface

Advanced   HDFS features

  • Load Balancer
  • Dist Cp
  • HDFS Federation
  • HDFS High Availability
  • Hadoop Archives

Map Reduce  –  1  (Theoretical Concepts)

MapReduce overview

  • Functional Programming  paradigm s
  • How to  think  in  a MapReduce way?

MapReduce Architecture

  • Legacy MR v/s Next Generation  MapReduce  ( aka YARN/ MRv2)
  • Slots v/s Containers
  • Schedulers
  • Shuffling, Sorting
  • Hadoop Data Types
  • Input and  Output Formats
  • Input Splits – Partitioning ( Hash Partitioner v/s Customer Partitioner)
  • Configuration files
  • Distributed Cache

MR Algorithm  and  Data Flow

  • Word Count

Alternatives to MR – BSP (Bulk Synchronous Parallel)

  • Adhoc querying
  • Graph Computing Engines

Map Reduce   –  2 (Practice)

Developing, debugging and deploying MR programs

  • Stand alone mode ( in Eclipse)
  • Pseudo distributed mode ( as in the Big Data VM)
  • Fully distributed mode ( as in Production)

MR API

  • Old and the new MR API
  • Java Client API
  • Hadoop data types and custom Writable / WritableCom parables
  • Different input  and  output formats
  • Saving Binary Data using Sequence Files and Avro Files

Hadoop Streaming (developing and debugging non Java MR program s – Ruby and Python)

Optimization techniques

  • Speculative execution
  • Combiners
  • JVM Reuse
  • Compression

MR algorithm s (Non- graph)

  • Sorting
  • Term Frequency
  • Inverse Document Frequency
  • Student Data Base
  • Max Temperature
  • Different ways of  joining data
  • Word Co- Occurrence

MR algorithm s (Graph)

  • PageRank
  • Inverted Index

Higher  Level  Abstractions for  MR (Pig)

  • Introduction and Architecture
  • Different Modes of executing Pig constructs Data Types
  • Dynamic invokers Pig streaming Macros
  • Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T, etc) User Defined Functions
  • Use Cases

Higher  Level  Abstractions for  MR (Hive)

  • Introduction and Architecture
  • Different Modes of executing Hive queries Metastore Implementations
  • HiveQL (DDL & DML Operations) External v/s Managed Tables Views
  • Partitions & Buckets User Defined Functions Transformations using Non Java Use Cases

Comparison  of  Pig  and Hive

NoSQL Databases – 1 (Theoretical Concepts)

NoSQL Concepts

  • Review of RDBMS
  • Need for NoSQL
  • Brewers CAP Theorem
  • ACI D v/s BASE

Schema on Read     Schema on Write

  • Different levels  of consistency
  • Bloom filters

Different   types  of  NoSQL databases

  • Key Value
  • Columnar
  • Document
  • Graph

Columnar Databases concepts NoSQL Databases – 2 (Practice) HBase Architecture

  • Master and  the  Region Server
  • Catalog tables ( ROOT and META)

Major and   Minor  compaction

  • Configuration files
  • HBase v/s Cassandra

Interfaces to HBase (for DDL and DML operations)

  • Java API
  • Client API
  • Filters
  • Scan Caching and Batching
  • Command Line Interface
  • REST API

Advance   HBase Features

  • HBase Data Modeling
  • Bulk loading  data  in HBase
  • HBase Coprocessors – EndPoints (similar to Stored Procedur es in RDBMS)
  • HBase Coprocessors – Observers (similar to Triggers in RDBMS)

Spark

  • Introduction to RDD
  • Installation and  Configuration  of Spark
  • Spark Architecture
  • Different interfaces  to Spark
  • Sample Python  program s in Spark

Setting up a Hadoop Cluster using Apache  Hadoop

  • Cloudera Hadoop cluster on the Amazon Cloud (Practice)
  • Using EMR ( Elastic  Map Reduce)

Using  EC2  ( Elastic Compute Cloud)

SSH Configuration

  • Stand alone m ode (Theory) Distributed m ode (Theory)
  • Pseudo distributed
  • Fully distributed

Hadoop  Ecosystem  and  Use Cases

  • Hadoop industry solutions
  • Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real- time events into HDFS using Flume
  • Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J
  • Processing data in real time using Storm
  • Interactive Adhoc  querying  with Impala

Proof  of  concepts and  use cases

  • Click Stream Analysis using Pig and Hive Analyzing the Twitter  data with Hive
  • Further ideas for  data analysis

SPARK DEVELOPMENT

Scala Basics

  • What is Scala?
  • Why Scala for Spark?
  • Intro to Scala REPL : Journey from Java to Scala
  • Installing Scala IDE
  • Basic Operations
  • Defining Functions

Scala Essentials

  • Control Structures in Scala
  • loops – ForEach, While, Do-While
  • Collections – Array, ArrayBuffer, Map, Tuples, Lists
  • If Statements
  • Conditional Operators
  • Enumerations

OOP’s and FP

  • Class and Object Basics
  • Scala Constructors
  • Nested Classes
  • Visibility Rules
  • Overriding Methods
  • Functional Programming
  • Higher Order Functions
  • Traits
  • Interfaces
  • Layered Traits

Prerequisite: BigData and Hadoop Framework

  • Introduction to BigData
  • Challenges with Bigdata
  • Batch Realtime processing
  • Overview- Hadoop Ecosystem
  • HDFS
  • Review of MapReduce
  • Hive
  • Sqoop
  • Flume

APACHE SPARK

Introduction to Spark

  • What is Spark?
  • Spark Overview
  • Setting up environment
  • Using Spark Shell
  • Spark Web UI

Spark Basics

  • RDD’s
  • Spark Context
  • Spark Ecosystem
  • In-Memory data – Spark

Working with RDD’s

  • Creating, Loading and Saving RDD
  • Transformations in RDD
  • Actions in RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD operations
  • RDD Partitions

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating Spark Context
  • Building a Spark Application
  • Running a Spark Application
  • Spark and Hadoop Integration-HDFS
  • Handling Sequence Files

Spark RDD

  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Spark SQL

  • Overview on Hive

Spark   SQL Architecture

  • SQLContext in Spark SQL
  • Working with
  • DataFrames
  • Example for Spark SQL
  • Integrating Hive and Spark SQL
  • DataFrames,Datasets and RDD’s
  • Caching dataframes
  • Knowing JSON and Parquet File Formats
  • Loading of data
  • Comparing Spark SQL,Impala and Hive-on-Spark

Spark Job Execution

  • Jobs, Stages and Tasks
  • partition and Shuffles
  • Data Locality

Spark Streaming

  • Spark Streaming Architecture
  • first Spark Streaming Programming
  • Transformations in Spark Streaming

Spark Mllib

  • What is Machine Learning?
  • ML library for Spark
  • ML Algorithms
  • ML using Pipelines and DataFrames

GraphX

  • Overview of GraphX
  • Components of GraphX
  • Hands on – PageRank, TriangleCount
  • Common Spark use-cases

Performance Tuning

  • Shared Variables : Broadcast Variables
  • Shared Variables: Accumulators
  • Common  Performance Issues
  • Performance tuning tips

Course Deliverables

  • Workshop style coaching
  • Interactive approach
  • Course material
  • POC Implementation
  • Hands on practice exercises for each topic
  • Quiz at the end of each major topic
  • Tips and techniques on Cloudera  Certification Examination
  • Linux concepts and basic commands

MONGODB

Introduction to NoSQL and MongoDB

  • RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, Introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples.

MongoDB Installation

  • Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) Installation, MongoDB Data types.

Importance of NoSQL

  • The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection & document, MongoDB uses, MongoDB Write Concern – Acknowledged, Replica Acknowledged, Unacknowledged, Journaled, Fsync.

CRUD Operations

  • Understanding CRUD and its functionality, CRUD concepts, MongoDB Query & Syntax, read and write queries and query optimization.

Data Modeling & Schema Design

  • Concepts of data modeling, difference between MongoDB and RDBMS modeling, Model tree structure, operational strategies, monitoring and backup.

Data Management & Administration

  • In this module you will learn MongoDB® Administration activities such as Health Check, Backup, Recovery, database sharding and profiling, Data Import/Export, Performance tuning etc.
  • Data Indexing and Aggregation
  • Concepts of data aggregation and types, data indexing concepts, properties and variations.

MongoDB Security

  • Understanding database security risks, MongoDB security concept and security approach, MongoDB integration with Java and Robomongo.
  • Working with Unstructured Data
  • Implementing techniques to work with variety of unstructured data like images, videos, log data, and others, understanding GridFS MongoDB file system for storing data.

MongoDB Project

Java is one of the most popular programming languages for working with MongoDB. This project tells you how to work with the MongoDB Java Driver, and using MongoDB as a Java Developer. Become proficient in creating a table for inserting video using Java programming. Some of the tasks and steps involved are as below

  • Installation of Java
  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Understanding about collections and documents
  • Reading and writing basics from the database
  • Learning about the Java Virtual Machine libraries

Exam & Certification

0

Course Review

(4.9)
5 stars
4 stars
3 stars
2 stars
1 stars

Course Curriculum

Course Objectives:

  • Java Training
  • Scala Programming
  • Big Data Hadoop Developer
  • Spark Development
  • Mongodb

JAVA TRAINING

Core Java Contents:

  • Features of Java
  • Java Basics
  • Classes and Objects
  • Garbage Collection
  • Java Arrays
  • Referring Java Documentation
  • Wrapper classes
  • Inheritance
  • Polymorphism
  • Abstract Classes
  • Interfaces
  • Packages
  • Introduction to Exception Handling
  • Checked/Unchecked Exceptions
  • Using try, catch, finally, throw, throws
  • Exception Propagation
  • Pre-defined Exceptions
  • User Defined Exceptions
  • Overview of Java IO Package
  • Byte streams
  • Character streams
  • Object serialization & Object Externalization
  • Introduction to GUI Programming (swing )
  • Introduction to multithreading
  • Thread life cycle
  • Thread priorities
  • Using wait() & notify()
  • DeadLocks
  • JDBC Architecture
  • Using JDBCI API
  • Transaction Management

Course Contents – Servlets and JSP

 Java Servlet Technology

  • What Is  a Servlet?
  • Servlet Life  Cycle
  • Initializing a Servlet
  • Writing Service Methods
  • Getting Information from Requests
  • Constructing Responses
  • ServletContext and ServletConfig Parameters
  • Attributes- Context, Request and Session
  • Maintaining Client State – Cookies/Url rewriting/Hidden Form Fields
  • Session Management
  • Servlet Communication – include, forward, redirect
  • WEB-INF and the Deployment Descriptor

Java Server Pages Technology

  • What Is a JSP Page?
  • The Life Cycle of a JSP Page
  • Execution of a JSP page
  • Different Types of tags(directive, standard actions, bean tags, expressions, declarative)
  • Creating Static Content
  • Creating Dynamic Content
  • Using Implicit Objects within JSP Pages
  • JSP Scripting Elements
  • Including Content in a JSP Page
  • Transferring Control to Another Web Component – communication with servlet
  • Param Element
  • JavaBeans Component Design Conventions
  • Why Use a JavaBeans Component?
  • Creating and Using a JavaBeans Component
  • Setting JavaBeans Component Properties
  • Retrieving JavaBeans Component Properties
  • Custom tags

SCALA PROGRAMMING

Introduction to Scala

  • A brief history of the Java platform to date
  • Distinguishing between the Java language and platform
  • Pain points when using Java for software development
  • Possible criteria for an improved version of Java
  • How and why the Scala language was created

Key Features of the Scala Language

  • Everything is an object
  • Class declarations
  • Data typing
  • Operators and methods
  • Pattern matching
  • Functions
  • Anonymous and nested functions
  • Traits

Basic Programming in Scala

  • Built in types, literals and operators
  • Testing for equality of state and reference
  • Conditionals, simple matching and external iteration
  • Working with lists, arrays, sets and maps
  • Throwing and catching exceptions
  • Adding annotations to your code
  • Using standard Java libraries
  • Using Scala with in java application and vice-versa

OO Development in Scala

  • A minimal class declaration
  • Understanding primary constructors
  • Specifying alternative constructors
  • Declaring and overriding methods
  • Creating base classes and class hierarchies
  • Creating traits and mixing them into classes
  • How a Scala inheritance tree is linearized

Functional Programming in Scala

  • Advanced uses of for expressions
  • Understanding function values and closures
  • Using closures to create internal iterators
  • Creating and using higher order functions
  • Practical examples of higher order functions
  • Currying and partially applied functions
  • Creating your own Domain Specific Languages(DSL’s)
  • Exception handling in Scala
  • Try catch with case

Pattern Matching in Depth

  • Using the match keyword to return a value
  • Using case classes for pattern matching
  • Adding pattern guards to match conditions
  • Partially specifying matches with wildcards
  • Deep matching using case constructors
  • Matching against collections of items
  • Using extractors instead of case classes
  • Test Driven Development in Scala

Writing standard JUnit tests in Scala

  • Conventional TDD using the ScalaTest tool
  • Behavior Driven Development using ScalaTest
  • Using functional concepts in TDD
  • XML Manipulating in Scala
  • Using Scala to read and write xml using different parsers (Dom, Sax)
  • Working with XML literals in code
  • Embedding XPath like expressions
  • Using Pattern Matching to process XML data
  • Serializing and deserializing to and from XML
  • Scala with database transactiono

Writing Concurrent Apps

  • Issues with conventional approaches to multi-threading
  • How an actor-based approach helps you write thread-safe code
  • The Scala architecture for creating actor-based systems
  • Different coding styles supported by the actor model

Scala web

  • Scala with JAXB
  • Scala to call/consume a REST/SOAP service
  • Scala with logging information
  • Using Scala in web application (JSP, Servlet)
  • Conclusion

 Introduction

  • Introduction
  • Module Outline
  • What We Will Build
  • History of Play!
  • Philosophy
  • Technologies
  • Summary

Starting Up

  • Introduction
  • Downloading Play!
  • The Play Command
  • Compiling and Hot Deploy
  • Testing
  • IDE’s
  • Project Structure
  • Configuration
  • Error Handling
  • Summary

Routing

  • Introduction
  • The Router
  • Router Mechanics
  • Routing Rules
  • Play! Routes
  • Play! Routes: HTTP Verbs
  • Play! Routes: The Path
  • Play! Routes: The Action Call
  • Routing in Action
  • Summary

Controllers, Actions, and Results

  • Introduction
  • Controllers
  • Actions
  • Results
  • Session and Flash Scope
  • Request Object
  • Implementing the Contacts Stub Controller
  • Summary

 Views

  • Introduction
  • Play! Views
  • Static Views
  • Passing Arguments
  • Iteration
  • Conditionals
  • Partials and Layouts
  • Accessing the Session Object
  • The Asset Route
  • Summary

 Data Access

  • Introduction
  • Agnostic Data Access
  • The Domain Model
  • Evolutions
  • Finder and Listing Contacts
  • The Form Object and Adding a Contact
  • Editing a Contact
  • Deleting a Contact
  • Review
  • Summary

The Global Object

  • Introduction
  • The Global Object
  • Global Object Methods
  • onStart
  • onHandlerNotFound
  • Summary

BIG DATA HADOOP DEVELOPER

Introduction to  Linux  and  Big  Data  Virtual  Machine (VM)

  • Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux – Why Linux? – Windows and the Linux equivalents – Different flavors of Linux – Unity Shell (Ubuntu UI) – Basic Linux Com m ands (enough to get started with Hadoop).

Understanding Big Data

  • 3V (Volume- Variety- Velocity) characteristics
  • Structured and Unstructured Data
  • Application and use cases of Big Data Limitations of traditional large Scale systems

How a distributed way of computing is superior (cost and scale) Opportunities and challenges with Big Data

HDFS (The Hadoop Distributed File System)

HDFS Overview and Architecture

    • Deployment Architecture
    • Nam e Node, Data Node and Checkpoint Node ( aka Secondary Name Node)
    • Safe mode
    • Configuration files
    • HDFS Data Flows ( Read v/s Write)

How   HDFS addresses fault tolerance?

    • CRC Check Sum
    • Data replication
    • Rack awareness and  Block  placement policy
    • Small files problem

HDFS Interfaces

  • Command Line Interface
  • File System
  • Administrative
  • Web Interface

Advanced   HDFS features

  • Load Balancer
  • Dist Cp
  • HDFS Federation
  • HDFS High Availability
  • Hadoop Archives

Map Reduce  –  1  (Theoretical Concepts)

MapReduce overview

  • Functional Programming  paradigm s
  • How to  think  in  a MapReduce way?

MapReduce Architecture

  • Legacy MR v/s Next Generation  MapReduce  ( aka YARN/ MRv2)
  • Slots v/s Containers
  • Schedulers
  • Shuffling, Sorting
  • Hadoop Data Types
  • Input and  Output Formats
  • Input Splits – Partitioning ( Hash Partitioner v/s Customer Partitioner)
  • Configuration files
  • Distributed Cache

MR Algorithm  and  Data Flow

  • Word Count

Alternatives to MR – BSP (Bulk Synchronous Parallel)

  • Adhoc querying
  • Graph Computing Engines

Map Reduce   –  2 (Practice)

Developing, debugging and deploying MR programs

  • Stand alone mode ( in Eclipse)
  • Pseudo distributed mode ( as in the Big Data VM)
  • Fully distributed mode ( as in Production)

MR API

  • Old and the new MR API
  • Java Client API
  • Hadoop data types and custom Writable / WritableCom parables
  • Different input  and  output formats
  • Saving Binary Data using Sequence Files and Avro Files

Hadoop Streaming (developing and debugging non Java MR program s – Ruby and Python)

Optimization techniques

  • Speculative execution
  • Combiners
  • JVM Reuse
  • Compression

MR algorithm s (Non- graph)

  • Sorting
  • Term Frequency
  • Inverse Document Frequency
  • Student Data Base
  • Max Temperature
  • Different ways of  joining data
  • Word Co- Occurrence

MR algorithm s (Graph)

  • PageRank
  • Inverted Index

Higher  Level  Abstractions for  MR (Pig)

  • Introduction and Architecture
  • Different Modes of executing Pig constructs Data Types
  • Dynamic invokers Pig streaming Macros
  • Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T, etc) User Defined Functions
  • Use Cases

Higher  Level  Abstractions for  MR (Hive)

  • Introduction and Architecture
  • Different Modes of executing Hive queries Metastore Implementations
  • HiveQL (DDL & DML Operations) External v/s Managed Tables Views
  • Partitions & Buckets User Defined Functions Transformations using Non Java Use Cases

Comparison  of  Pig  and Hive

NoSQL Databases – 1 (Theoretical Concepts)

NoSQL Concepts

  • Review of RDBMS
  • Need for NoSQL
  • Brewers CAP Theorem
  • ACI D v/s BASE

Schema on Read     Schema on Write

  • Different levels  of consistency
  • Bloom filters

Different   types  of  NoSQL databases

  • Key Value
  • Columnar
  • Document
  • Graph

Columnar Databases concepts NoSQL Databases – 2 (Practice) HBase Architecture

  • Master and  the  Region Server
  • Catalog tables ( ROOT and META)

Major and   Minor  compaction

  • Configuration files
  • HBase v/s Cassandra

Interfaces to HBase (for DDL and DML operations)

  • Java API
  • Client API
  • Filters
  • Scan Caching and Batching
  • Command Line Interface
  • REST API

Advance   HBase Features

  • HBase Data Modeling
  • Bulk loading  data  in HBase
  • HBase Coprocessors – EndPoints (similar to Stored Procedur es in RDBMS)
  • HBase Coprocessors – Observers (similar to Triggers in RDBMS)

Spark

  • Introduction to RDD
  • Installation and  Configuration  of Spark
  • Spark Architecture
  • Different interfaces  to Spark
  • Sample Python  program s in Spark

Setting up a Hadoop Cluster using Apache  Hadoop

  • Cloudera Hadoop cluster on the Amazon Cloud (Practice)
  • Using EMR ( Elastic  Map Reduce)

Using  EC2  ( Elastic Compute Cloud)

SSH Configuration

  • Stand alone m ode (Theory) Distributed m ode (Theory)
  • Pseudo distributed
  • Fully distributed

Hadoop  Ecosystem  and  Use Cases

  • Hadoop industry solutions
  • Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real- time events into HDFS using Flume
  • Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J
  • Processing data in real time using Storm
  • Interactive Adhoc  querying  with Impala

Proof  of  concepts and  use cases

  • Click Stream Analysis using Pig and Hive Analyzing the Twitter  data with Hive
  • Further ideas for  data analysis

SPARK DEVELOPMENT

Scala Basics

  • What is Scala?
  • Why Scala for Spark?
  • Intro to Scala REPL : Journey from Java to Scala
  • Installing Scala IDE
  • Basic Operations
  • Defining Functions

Scala Essentials

  • Control Structures in Scala
  • loops – ForEach, While, Do-While
  • Collections – Array, ArrayBuffer, Map, Tuples, Lists
  • If Statements
  • Conditional Operators
  • Enumerations

OOP’s and FP

  • Class and Object Basics
  • Scala Constructors
  • Nested Classes
  • Visibility Rules
  • Overriding Methods
  • Functional Programming
  • Higher Order Functions
  • Traits
  • Interfaces
  • Layered Traits

Prerequisite: BigData and Hadoop Framework

  • Introduction to BigData
  • Challenges with Bigdata
  • Batch Realtime processing
  • Overview- Hadoop Ecosystem
  • HDFS
  • Review of MapReduce
  • Hive
  • Sqoop
  • Flume

APACHE SPARK

Introduction to Spark

  • What is Spark?
  • Spark Overview
  • Setting up environment
  • Using Spark Shell
  • Spark Web UI

Spark Basics

  • RDD’s
  • Spark Context
  • Spark Ecosystem
  • In-Memory data – Spark

Working with RDD’s

  • Creating, Loading and Saving RDD
  • Transformations in RDD
  • Actions in RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD operations
  • RDD Partitions

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating Spark Context
  • Building a Spark Application
  • Running a Spark Application
  • Spark and Hadoop Integration-HDFS
  • Handling Sequence Files

Spark RDD

  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Spark SQL

  • Overview on Hive

Spark   SQL Architecture

  • SQLContext in Spark SQL
  • Working with
  • DataFrames
  • Example for Spark SQL
  • Integrating Hive and Spark SQL
  • DataFrames,Datasets and RDD’s
  • Caching dataframes
  • Knowing JSON and Parquet File Formats
  • Loading of data
  • Comparing Spark SQL,Impala and Hive-on-Spark

Spark Job Execution

  • Jobs, Stages and Tasks
  • partition and Shuffles
  • Data Locality

Spark Streaming

  • Spark Streaming Architecture
  • first Spark Streaming Programming
  • Transformations in Spark Streaming

Spark Mllib

  • What is Machine Learning?
  • ML library for Spark
  • ML Algorithms
  • ML using Pipelines and DataFrames

GraphX

  • Overview of GraphX
  • Components of GraphX
  • Hands on – PageRank, TriangleCount
  • Common Spark use-cases

Performance Tuning

  • Shared Variables : Broadcast Variables
  • Shared Variables: Accumulators
  • Common  Performance Issues
  • Performance tuning tips

Course Deliverables

  • Workshop style coaching
  • Interactive approach
  • Course material
  • POC Implementation
  • Hands on practice exercises for each topic
  • Quiz at the end of each major topic
  • Tips and techniques on Cloudera  Certification Examination
  • Linux concepts and basic commands

MONGODB

Introduction to NoSQL and MongoDB

  • RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, Introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples.

MongoDB Installation

  • Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) Installation, MongoDB Data types.

Importance of NoSQL

  • The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection & document, MongoDB uses, MongoDB Write Concern – Acknowledged, Replica Acknowledged, Unacknowledged, Journaled, Fsync.

CRUD Operations

  • Understanding CRUD and its functionality, CRUD concepts, MongoDB Query & Syntax, read and write queries and query optimization.

Data Modeling & Schema Design

  • Concepts of data modeling, difference between MongoDB and RDBMS modeling, Model tree structure, operational strategies, monitoring and backup.

Data Management & Administration

  • In this module you will learn MongoDB® Administration activities such as Health Check, Backup, Recovery, database sharding and profiling, Data Import/Export, Performance tuning etc.
  • Data Indexing and Aggregation
  • Concepts of data aggregation and types, data indexing concepts, properties and variations.

MongoDB Security

  • Understanding database security risks, MongoDB security concept and security approach, MongoDB integration with Java and Robomongo.
  • Working with Unstructured Data
  • Implementing techniques to work with variety of unstructured data like images, videos, log data, and others, understanding GridFS MongoDB file system for storing data.

MongoDB Project

Java is one of the most popular programming languages for working with MongoDB. This project tells you how to work with the MongoDB Java Driver, and using MongoDB as a Java Developer. Become proficient in creating a table for inserting video using Java programming. Some of the tasks and steps involved are as below

  • Installation of Java
  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Understanding about collections and documents
  • Reading and writing basics from the database
  • Learning about the Java Virtual Machine libraries

    Click here for Help and Support: info@sacrostectservices.com     For Inquiry Call Us:   +91 996-629-7972(IND)

  +91 996-629-7972(IND)
X

Quick Enquiry

X

Business Enquiry