Big Data Development
Track 1
Big Data Overview
- Introduction to Big Data
- Apache Hadoop and Hadoop Ecosystem
- Apache Hadoop file storage and cluster components
- Distributed Processing and Yarn Architecture
Track 2
Data Ingest
- Data Ingestion using Sqoop
- Ingest real-time and near-real-time streaming data into HDFS
- Process streaming data as it is loaded onto the cluster
- Load data into HDFS using the Hadoop File System commands
Track 3
Transform,Stage and Store
- Load RDD data from HDFS for use in Spark applications
- Write the results from an RDD back into HDFS using Spark
- Read and write files in a variety of file formats
- Perform standard extract, transform, load (ETL) processes on data
Track 4
Data Analysis
- Use metastore tables as an input source or an output sink for Spark applications
- Understand the fundamentals of querying datasets in Spark
- Filter data and Write queries that calculate aggregate statistics
- Join disparate datasets using Spark
- Produce ranked or sorted data
Track 5
Tools and Language
- Sqoop
- Hbase
- Hive
- Scala and Python
- Kafka
Track 6
Capstone Project
- Crime Prediction
- Fraud Detection
- Modelling Natural Language
Track 7
Career Service
- Career Advice
- Create a high-quality resume and cover letter
- Interview coaching and practice
- Job search Advice
- Mock interviews for both technical and non-technical topics
Track 8
Mentorship Support
- Setting learning goals
- Review of projects and exercises
- Industry insights
- Interview tips
- Career and Job Search advice
- Tracking weekly progress