Big Data Development

Track 1

Big Data Overview

Introduction to Big Data

Apache Hadoop and Hadoop Ecosystem

Apache Hadoop file storage and cluster components

Distributed Processing and Yarn Architecture

Track 2

Data Ingest

Data Ingestion using Sqoop

Ingest real-time and near-real-time streaming data into HDFS

Process streaming data as it is loaded onto the cluster

Load data into HDFS using the Hadoop File System commands

Track 3

Transform,Stage and Store

Load RDD data from HDFS for use in Spark applications

Write the results from an RDD back into HDFS using Spark

Read and write files in a variety of file formats

Perform standard extract, transform, load (ETL) processes on data

Track 4

Data Analysis

Use metastore tables as an input source or an output sink for Spark applications

Understand the fundamentals of querying datasets in Spark

Filter data and Write queries that calculate aggregate statistics

Join disparate datasets using Spark

Produce ranked or sorted data

Track 5

Tools and Language

Sqoop

Hbase

Hive

Scala and Python

Kafka

Track 6

Capstone Project

Crime Prediction

Fraud Detection

Modelling Natural Language

Track 7

Career Service

Career Advice

Create a high-quality resume and cover letter

Interview coaching and practice

Job search Advice

Mock interviews for both technical and non-technical topics

Track 8

Mentorship Support

Setting learning goals

Review of projects and exercises

Industry insights

Interview tips

Career and Job Search advice

Tracking weekly progress