Skip to content

Commit cca0877

Browse files
Update README.md
1 parent 1d05685 commit cca0877

File tree

1 file changed

+37
-2
lines changed

1 file changed

+37
-2
lines changed

Diff for: README.md

+37-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,37 @@
1-
# Big-Data-Analysis-with-Python
2-
Combine Spark and Python to process large datasets and unlock the power of parallel computing and machine learning
1+
2+
[![GitHub issues](https://img.shields.io/github/issues/TrainingByPackt/Big-Data-Analysis-with-Python.svg)](https://github.com/TrainingByPackt/Big-Data-Analysis-with-Python/issues)
3+
[![GitHub forks](https://img.shields.io/github/forks/TrainingByPackt/Big-Data-Analysis-with-Python.svg)](https://github.com/TrainingByPackt/Big-Data-Analysis-with-Python/network)
4+
[![GitHub stars](https://img.shields.io/github/stars/TrainingByPackt/Big-Data-Analysis-with-Python.svg)](https://github.com/TrainingByPackt/Big-Data-Analysis-with-Python/stargazers)
5+
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/TrainingByPackt/Big-Data-Analysis-with-Python/pulls)
6+
7+
# Big Data Analysis with Python
8+
Processing big data in real time is challenging due to scalability, information inconsistency, and fault tolerance. Big Data Analysis with Python teaches you how to use tools that can control this data avalanche for you. With this book, you'll learn effective techniques to aggregate data into useful dimensions for posterior analysis, extract statistical measurements, and transform datasets into features for other systems.
9+
10+
The book begins with an introduction to data manipulation in Python using Pandas. You'll then get familiar with statistical analysis and plotting techniques. With multiple hands-on activities in store, you'll be able to analyze data that is distributed on several computers by using Dask. As you progress, you'll study how to aggregate data for plots when the entire data cannot be accommodated into memory. You'll also explore Hadoop (HDFS and YARN), which will help you tackle larger datasets. The book further covers Spark and its interaction with other tools.
11+
12+
By the end of this book, you'll be able to bootstrap your own Python environment, process large files, and manipulate data to generate statistics, metrics, and graphs.
13+
14+
## Learning Objectives
15+
* Use Python to read and transform data into different formats
16+
* Generate basic statistics and metrics using data on the disk
17+
* Work with computing tasks distributed over a cluster
18+
* Convert data from different sources into storage or querying formats
19+
* Prepare data for statistical analysis, visualization, and machine learning
20+
* Present data in the form of effective visuals
21+
22+
23+
### Hardware Requirements
24+
For an optimal experience, we recommend the following hardware configuration:
25+
* Processor: Dual Core or better
26+
* Memory: 4GB RAM
27+
* Storage: 10 GB available space
28+
29+
30+
### Software Requirements
31+
* Windows 7 SP1 32/64-bit,
32+
* Windows 8.1 32/64-bit or Windows 10 32/64-bit
33+
* Ubuntu 14.04 or later
34+
* macOS Sierra or later
35+
* Browser: Google Chrome or Mozilla Firefox
36+
* Conda
37+
* Jupyterlab

0 commit comments

Comments
 (0)