This course runs for a duration of 3 days.
The class will run daily from 10:00 AM EST to 6:00 PM EST.
Class Location: Virtual LIVE Instructor Led - Virtual Live Classroom.
Data engineering is a software engineering practice with focus on design, development, and the productionizing of data processing systems. It includes all the practical aspects of data acquisition, transfer, transformation, and storage on-prem or in the cloud.
This intensive hands-on training course teaches the students how to apply Python to the practical aspects of data engineering and introduces the students to the popular Python libraries used in the field, including NumPy, pandas, Matplotlib, scikit-learn, and Apache Spark.
Topics
Audience
Developers, Software Engineers, Data Scientists, and IT Architects
Chapter 1. Defining Data Engineering
Chapter 2. Distributed Computing Concepts for Data Engineers
Chapter 3. Data Processing Phases
Chapter 4. Quick Introduction to Python for Data Engineers
Chapter 5. Practical Introduction to NumPy
Chapter 6. Practical Introduction to Pandas
Chapter 7. Descriptive Statistics Computing Features in Python
Chapter 8. Data Grouping and Aggregation with pandas
Chapter 9. Repairing and Normalizing Data
Chapter 10. Data Visualization in Python using matplotlib
Chapter 11. Parallel Data Processing with PySpark
Chapter 12. Python as a Cloud Scripting Language
Lab Exercises
Lab 1. A/B Testing Data Engineering Tasks Project
Lab 2. Data Availability and Consistency
Lab 3. Using Jupyter Notebook
Lab 4. Understanding Python
Lab 5. Data Engineering Project
Lab 6. Understanding NumPy
Lab 7. A NumPy Project
Lab 8. Understanding pandas
Lab 9. Working with Files in pandas
Lab 10. Data Grouping and Aggregation
Lab 11. Repairing and Normalizing Data
Lab 12. Data Visualization in Jupyter Notebooks using matplotlib
Lab 13. Exploratory Data Analysis (EDA)
Lab 14. Correlating Cause and Effect
Lab 15. Using the Parquet Data Format
Participants are expected to have practical experience coding in one or more modern programming languages. Knowledge of Python is desirable but not necessary. The students are expected to be able to quickly learn the new material, reinforce the knowledge of a learned topic by doing programming exercises (labs), and then apply their knowledge in data engineering mini projects.