Big Data Comprehensive Training (Practical)

The Big Data foundation course provides you with an understanding of Big Data, potential data sources that can be used for solving real business problems, and an overview of data mining and the tools used in it.

  • 4 Days Workshop
  • Completion Certificate awarded by GKK

  • BASED ON REQUEST
  • Please contact us directly for more details

DAY1


Module 1: Big Data – History, Overview, and Characteristics

History
Big Data Definition
Big Data Benefits
Big Data Characteristics
Volume
Velocity
Variety

Big Data Technologies – Overview

Big Data Success Stories

Big Data – Privacy and Ethics

Privacy – Compliance
Privacy – Challenges
Privacy – Approach
Ethics

Big Data Projects

Who Should Be Involved?
What Is Involved?

Module 2: Big Data Sources

2.1 Enterprise Data Sources

Enterprise Systems
Oracle
SAP
Microsoft
Data Warehouses
Unstructured Data – Introduction
Unstructured Data – Metadata

2.2 Social Media Data Source

Introduction
Facebook – Introduction
Facebook – Public Feed API
Facebook – Keyword Insights API
Facebook – Graph API
Twitter – Introduction
Twitter – Streaming APIs
Twitter – REST APIs
Other Social Media

2.3 Public Data Sources

Introduction
Weather
Economics
Finance
Regulatory Bodies

DAY2


Module 3: Data Mining – Concepts and Tools

3.1 Data Mining – Introduction

Introduction
Types of Data Mining – Overview
Types of Data Mining – Classification
Types of Data Mining – Association
Types of Data Mining – Clustering

3.2 Data Mining – Tools

Introduction
Weka
Modules of Weka Applications
KNIME
KNIME – Example
R Language

DAY3


Module 4: The Hadoop Distributed File System (HDFS)

4.1 Hadoop Fundamentals

Introduction
Main Components of Hadoop
Additional Components of Hadoop

4.2. The Hadoop Distributed File System (HDFS)

Overview of HDFS
Launching HDFS in Pseudo-Distributed Mode Core HDFS Services
Installing and Configuring HDFS
HDFS Commands
HDFS Safe Mode
Check Pointing HDFS
Federated and High Availability HDFS
Running a Fully-Distributed HDFS Cluster with Docker

4.3. MapReduce with Hadoop

MapReduce from the Linux Command Line Scaling MapReduce on a Cluster Introducing Apache Hadoop Overview of YARN
Launching YARN in Pseudo-Distributed Mode Demonstration of the Hadoop Streaming API Demonstration of MapReduce with Java

Module 5: Apache

5.1. Introduction to Apache Spark

Why Spark?
Spark Architecture
Spark Drivers and Executors
Spark on YARN
Spark and the Hive Metastore
Structured APIs, DataFrames, and Datasets
The Core API and Resilient Distributed Datasets (RDDs)
Overview of Functional Programming
MapReduce with Python

5.2. Apache Hive

Hive as a Data Warehouse
Hive Architecture
Understanding the Hive Metastore and HCatalog Interacting with Hive using the Beeline Interface Creating Hive Tables
Loading Text Data Files into Hive
Exploring the Hive Query Language
Partitions and Buckets
Built-in and Aggregation Functions Invoking MapReduce Scripts from Hive Common File Formats for Big Data Processing Creating Avro and Parquet Files with Hive Creating Hive Tables from Pig
Accessing Hive Tables with the Spark SQL Shell

5.3. Persisting Data with Apache HBase

Features and Use Cases
HBase Architecture
The Data Model
Command Line Shell
Schema Creation
Considerations for Row Key Design

5.4 Apache Storm

Processing Real-Time Streaming Data
Storm Architecture: Nimbus, Supervisors, and ZooKeeper
Application Design: Topologies, Spouts, and Bolts

DAY4


Module 6: Data Modelling with Document Databases

6.1 MongoDB Fundamentals

Introduction
Replication
Sharding
Sharding and Replication
MongoDB Ecosystem – Languages and Drivers
MongoDB Ecosystem – Hadoop Integration
MongoDB Ecosystem – Tools

6.2 Install and Configure

Download
How to Install and Configure

6.3 Document Databases

Introduction
Documents
Document Design Considerations
Fields

6.4 Data Modelling with Document Databases

Introduction
Twitter Sentiment Analysis
Twitter Sentiment Analysis – Algorithm
Network Log Analysis
Network Log Analysis – Algorithm

What are the course objectives?


  • What are the prerequisites?

    All trainees to have the following:

    i) Required knowledge for attendees
    Conversant with any imperative programming language like C
    Knowledge of SQL query

    ii) Hardware Requirement
    Minimum Configuration of Laptop
    Memory/ RAM 8 GB
    Free Disk Space 30 GB
    4 CPU cores

    iii) Software Requirement:
    Windows or Mac
    Oracle Virtual Box (https://www.virtualbox.org/wiki/Downloads)

Who should take the course?


  • Software developers
  • IT managers
  • Service management professionals
  • Technology Managers

Who is your trainer for the program?


  •  

We offer the following options:


  • Cash
  • HRDF Claimable
  • Maybank Ezpay (Up to 24 months @ 0% Interest)
  • CIMB Easy Pay (Up to 12 months @ 0% Interest)
  • Cash Installment (Case by case basis)

Futureproof Yourself With Us!

Find Out More