
Data Eng.
Data Engineering – Complete Course Content (Beginner to Advanced)
Introduction to Data Engineering
- Role of Data Engineer
- Data Engineering vs Data Science vs Data Analytics
- Modern data ecosystem overview
- ETL vs ELT
- Batch vs Streaming data
- Data lake, Data warehouse, Data lakehouse concepts
Linux & Shell Basics (For Data Engineers)
- Basic Linux commands
- File system navigation
- File manipulation
- Shell scripting basics
- Automating simple workflows
Python for Data Engineering
Core Python
- Variables, data types, loops, functions
- File handling
- Exception handling
Python for Data Pipelines
- Working with CSV, JSON, Excel
- Reading/writing large datasets
- Pandas for data processing
- Boto3 / APIs (basic introduction for pipelines)
SQL for Data Engineering
- SQL fundamentals
- Joins, aggregations, subqueries
- CTEs
- Window functions
- Query optimization basics
- Writing analytical SQL queries
- Stored procedures & functions (basics)
Data Modeling
- OLTP vs OLAP
- Star schema & Snowflake schema
- Fact & dimension tables
- Normalization & denormalization
- Slowly Changing Dimensions (SCD Type 1, 2)
ETL / ELT & Pipeline Concepts
- What is a pipeline?
- Data ingestion techniques
- Data transformation layers
- Batch pipelines
- Incremental loads & change data capture (CDC)
- Error handling & logging
- Orchestration concepts
Big Data & Distributed Systems
- Hadoop ecosystem overview
- HDFS concepts
- MapReduce basics
- YARN & distributed computation
- Why Spark replaced Hadoop MapReduce
Apache Spark (Core Skill for Data Engineers)
Spark Core
- RDD concepts
- Transformations & actions
- Lazy evaluation
- Partitioning
Spark SQL
- DataFrames
- Dataset API
- SQL queries
- Optimizer & Catalyst
- UDFs
Spark for ETL
- Reading/writing CSV, JSON, Parquet
- Working with large datasets
- Writing pipelines in PySpark
Cloud Data Engineering
Azure / AWS / GCP Overview
- IAM concepts
- Storage services
- Compute services
- Serverless basics
Cloud Storage
- AWS S3 / Azure Blob / GCP Storage
Compute & Processing
- AWS EMR / Azure Databricks / GCP Dataproc
- Serverless compute (AWS Lambda / Azure Functions)
Cloud Data Warehouse
- Snowflake
- AWS Redshift
- Azure Synapse
- Google BigQuery
Modern Data Pipelines Tools
- Airflow (workflow orchestration)
- ADF / AWS Glue
- Databricks workflows
- Kafka basics (streaming ingestion)
- Event hubs / Pub-sub overview
Data Quality & Governance
- Data validation techniques
- Data quality checks (DQ rules)
- Logging & monitoring
- Metadata management
- Lineage basics (Collibra/Apache Atlas overview)
DevOps for Data Engineering
- Git & GitHub
- CI/CD concepts
- Docker basics
- Deploying ETL pipelines
- Environment management
Mini Projects + Capstone
- ETL pipeline using SQL + Python
- PySpark batch processing pipeline
- Data ingestion from API to cloud storage
- End-to-end pipeline using Airflow/ADF
- Data warehouse modeling project
- Resume building for Data Engineers
- Interview preparation + coding round

- Direct Material Procurement Process through SAP FIORI Apps
- Indirect Material Procurement Processes
- Inventory Management and Physical Inventory
- Quantity and Value Contract and Scheduling Agreement
- Stock Transfer, Stock Transport Orders, and Intercompany Purchasing
- Physical Inventory Count and Cycle Counting Method
- Goods Movement and Vendor Return Processes
- Evaluated Receipt Settlement and Invoicing Plans
- Material Classification, Batch Management, and Serial Numbers
- MRP (Material Requirements Planning) and PO Release Strategy