Skip to content

kandakji/SparkSQL-Analysis-Aviation-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

SparkSQL

In this notebook, we will learn how to use the DataFrame API and SparkSQL to perform simple data analytics tasks.

Goals

The main goals of this notebook are the following:

  1. Understand the advantages and disadvantages of using DataFrame over RDD
  2. Analyze the airline data with the DataFrame API and SparkSQL

Steps

  • First, in section 1, we will go through a short introduction about the DataFrame API with a small example to see how can we use it and how it compares to the low-level RDD abstraction.
  • In section 2, we delve into the details of the use case of this notebook including: providing the context, introducing the data
  • In section 3, we perform data exploration and analysis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published