Skip to content

Latest commit

 

History

History
97 lines (76 loc) · 8.02 KB

data_engineering.md

File metadata and controls

97 lines (76 loc) · 8.02 KB

Curriculum version: 1.0.0 (see CHANGELOG)

Python Certification

Python is an extremely popular programming language for machine learning and data engineering. The provided courses will give you foundational knowledge to be productive in data engineering platforms and tools like Palantir Foundry, Databricks, and Spark.

Topics covered:

Courses Duration Effort Prerequisites Discussion
Introduction To Python Scripting 5 hours 5 hours/week Core Curriculum chat
Introduction To Python Development 20 hours 20 hours/week Core Curriculum, Introduction To Python Scripting chat
Python Use Cases 5 hours 5 hours/week Core Curriculum, Introduction To Python Development chat
Python Certification Course 20 hours 20 hours/week Core Curriculum, All courses above chat

SQL Certification

Topics covered:

  • Knowledge of all the essential SQL commands
  • Become competent in using sorting and filtering commands in SQL
  • Enhance the performance of your Database by using Views and Indexes
  • Become proficient in SQL tools like GROUP BY, JOINS and Subqueries
  • Master SQL's most popular string, mathematical and date-time functions
  • Increase your efficiency by learning the best practices while writing SQL queries
Courses Duration Effort Prerequisites Discussion
SQL Basics 20 hours 20 hours/week Core Curriculum chat
SQL Masterclass 40 hours 20 hours/week Core Curriculum, SQL Basics chat

Spark Certification

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Spark is a key component of platforms such as Palantir Foundry and Databricks, making it the cornerstone of the data engineering stack.

Topics covered:

  • Apply Spark programming basics, including parallel programming basics forDataFrames, data sets, and Spark SQL.
  • Hadoop Ecosystem, Core Components and HDFS
  • YARN and its Advantage
  • Hadoop Cluster and its Architecture
  • Big Data Analytics with Batch & Real-Time Processing
  • PySpark
Courses Duration Effort Prerequisites Discussion
PySpark Certification Training 120 hours 20 hours/week Python Certification, SQL Certification chat

Statistics and ML Certification (optional)

This MicroMasters program in Statistics and Data Science is comprised of four online courses and a virtually proctored exam that will provide you with the foundational knowledge essential to understanding the methods and tools used in data science, and hands-on training in data analysis and machine learning. You will dive into the fundamentals of probability and statistics, as well as learn, implement, and experiment with data analysis techniques and machine learning algorithms. This program will prepare you to become an informed and effective practitioner of data science who adds value to an organization. The program certificate can be applied, for admitted students, towards a PhD in Social and Engineering Systems (SES) through the MIT Institute for Data, Systems, and Society (IDSS) or may accelerate your path towards a Master’s degree at other universities around the world.

Topics covered:

  • Master the foundations of data science, statistics, and machine learning
  • Analyze big data and make data-driven predictions through probabilistic modeling and statistical inference; identify and deploy appropriate modeling and methodologies in order to extract meaningful information for decision making
  • Develop and build machine learning algorithms to extract meaningful information from seemingly unstructured data; learn popular unsupervised learning methods, including clustering methodologies and supervised methods such as deep neural networks
  • Finishing this MicroMasters program will prepare you for job titles such as: Data Scientist, Data Analyst, Business Intelligence Analyst, Systems Analyst, Data Engineer
Courses Duration Effort Prerequisites Discussion
Mitx Statistics & Data Science 756 hours 14 hours/week Core Curriculum, Python Certification, College level calculus chat

Mitx Supply Chain Management (optional)

Gain expertise in the growing field of Supply Chain Management through an innovative online program consisting of five courses and a final capstone exam. The MicroMasters Program in Supply Chain from MITx is an advanced, professional, graduate-level foundation in Supply Chain Management.

Topics covered:

  • To apply core methodologies (probability, statistics, optimization) used in supply chain modeling and analysis.
  • To understand and use fundamental models to make trade-offs between forecasting, inventory, and transportation.
  • To design supply chain networks as well as financial and information flows.
  • To understand how supply chains act as systems and interact.
  • How technology is used within supply chains from fundamentals to packaged software systems.
  • End to end supply chain management.
Courses Duration Effort Prerequisites Discussion
Mitx Supply Chain Management 936 hours 14 hours/week Core Curriculum, Python Certification, College level calculus chat

Final Exam

Once you have developed skills you will need a way to gain experience. There are three good ways to gain the experience to get you hired.

  1. Stack overflow. This is a website that helps software engineers find answers to bugs and common problems. If you build up a high rank on stack overflow it will help get your foot in the door.
  2. Hacker Rank. Hacker Rank lets developers solve challenges to earn points. Recruiters can then request interviews with top-ranked coders. This is an awesome resource!
  3. Contribute to at least one popular open source project for six months. Some popular projects are below:

Chat is here. To see the latest trending projects click here.