Big Data Development

Track 1

Big Data Overview

  • Introduction to Big Data
  • Apache Hadoop and Hadoop Ecosystem
  • Apache Hadoop file storage and cluster components
  • Distributed Processing and Yarn Architecture

Track 2

Data Ingest

  • Data Ingestion using Sqoop
  • Ingest real-time and near-real-time streaming data into HDFS
  • Process streaming data as it is loaded onto the cluster
  • Load data into HDFS using the Hadoop File System commands

Track 3

Transform,Stage and Store

  • Load RDD data from HDFS for use in Spark applications
  • Write the results from an RDD back into HDFS using Spark
  • Read and write files in a variety of file formats
  • Perform standard extract, transform, load (ETL) processes on data

Track 4

Data Analysis

  • Use metastore tables as an input source or an output sink for Spark applications
  • Understand the fundamentals of querying datasets in Spark
  • Filter data and Write queries that calculate aggregate statistics
  • Join disparate datasets using Spark
  • Produce ranked or sorted data

Track 5

Tools and Language

  • Sqoop
  • Hbase
  • Hive
  • Scala and Python
  • Kafka

Track 6

Capstone Project

  • Crime Prediction
  • Fraud Detection
  • Modelling Natural Language

Track 7

Career Service

  • Career Advice
  • Create a high-quality resume and cover letter
  • Interview coaching and practice
  • Job search Advice
  • Mock interviews for both technical and non-technical topics

Track 8

Mentorship Support

  • Setting learning goals
  • Review of projects and exercises
  • Industry insights
  • Interview tips
  • Career and Job Search advice
  • Tracking weekly progress