Skip to content

hathawayj/io-pandas-tidyverse

Repository files navigation

Data parsing example

I use the marathon data that the New York Times article What Good Marathons and Bad Investments Have in Common used.

They provide links to the entire data of almost ten million records in csv from box.com. I have removed a few columns and provided two formats from dropbox.

You can find the same data in .feather and .parquet formats in this repository's arrow folder.

R scripts

  • initial_setup.R provides the script that drops columns from the original source.
  • create_arrow.R provides an example of converting a large file from .sas7bdat to .feather and .parquet. The results are in arrow.
  • data_digest.R provides size and parsing time for each format.

Python scripts

  • create_arrow.py provides an example of converting a large file from .sas7bdat to .feather and .parquet. The results are in py_arrow
  • data_digest.R provides sizes and parsing for .sas7bdat and .parquet.

Data exploration example

The explore_bigdata.R file provides a short example.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published