In this notebook, we will learn how to use the DataFrame API and SparkSQL to perform simple data analytics tasks.
The main goals of this notebook are the following:
- Understand the advantages and disadvantages of using DataFrame over RDD
- Analyze the airline data with the DataFrame API and SparkSQL
- First, in section 1, we will go through a short introduction about the DataFrame API with a small example to see how can we use it and how it compares to the low-level RDD abstraction.
- In section 2, we delve into the details of the use case of this notebook including: providing the context, introducing the data
- In section 3, we perform data exploration and analysis