-
Notifications
You must be signed in to change notification settings - Fork 2
JackStrawFromWichita/all-the-grateful-dead
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
#----------------------------------------------------------------------------------------------------------------------- # AS A DEADHEAD, ARCHIVE.ORG IS AN INVALUABLE RESOURCE. WE ARE ALL FAMILIAR WITH SHOWS THAT ALLOW YOU TO DOWNLOAD FLACS OR MP3S, # AND FOR 'STREAM ONLY' SHOWS THERE IS OUR BELOVED GRATEFUL GRABBER. BUT WHAT IF YOU WANT EVERY SHOW? YES, EVERY SHOW. OR EVERY SHOW OF A PARTICULAR YEAR. # PARANOID BY NATURE, I BEGAN IMAGINING A DISASTER SCENARIO WHERE THE ZOMBIE (OR OTHER) APOCALYPSE TAKES DOWN THE INTERNET, # AND WITH IT, OUR BELOVED ARCHIVE. IN THAT BRAVE NEW LANDSCAPE I STILL WANT TO DISCOVER NEW SHOWS AND LISTEN TO THEM AS I PLEASE, # ON A GENERATOR POWERED STEREO, AS I WAIT FOR THE UNKNOWN. # AS OF 11/20/2019 THERE ARE EXACTLY 14000 SHOWS IN THE ARCHIVE'S GRATEFUL DEAD COLLECTION. # IF YOU HAVE A GOOD INTERNET CONNECTION AND PLENTY OF STORAGE SPACE, YOU CAN DOWNLOAD ALL 14000 SHOWS WITH 7 LINES OF CODE. # THOUGH SOME MAY FIND IT MORE MANAGEABLE TO SEGMENT AND DOWNLOAD SHOWS BY YEAR. # AFTER EXTENSIVE RESEARCH AND MUCH TRIAL AND ERROR I STUMBLED UPON THESE RESOURCES WHICH LED TO THE SOLUTION: # ARCHIVE HAS IT'S OWN PYTHON API FOR DOWNLOADING STUFF??? THIS IS WAY EASIER THAN THE BEAUTIFUL SOUP STUFF I WAS TRYING: # https://gareth.halfacree.co.uk/2013/04/bulk-downloading-collections-from-archive-org # HOW TO OBTAIN LISTS OF IDENTIFIERS USED TO DOWNLOAD SHOWS: # https://blog.archive.org/2012/04/26/downloading-in-bulk-using-wget/ # internetarchive API QUICK START GUIDE: # https://archive.org/services/docs/api/internetarchive/quickstart.html # PRE-REQUISITES: # DOWNLOAD AND INSTALL PYTHON: https://www.python.org/downloads/ # DOWNLOAD A PYTHON IDE (OPTIONAL): https://www.jetbrains.com/pycharm/ # GET internetarchive PACKAGE: (from command line) pip install internetarchive # CREATE CONFIG FILE WITH ARCHIVE.ORG CREDENTIALS: # (from command line) ia configure # Enter your archive.org credentials below to configure 'ia'. # # Email address: [email protected] # Password: # # Config saved to: /home/user/.config/ia.ini # OBTAIN CSV LIST OF ALL SHOWS: # GO TO ARCHIVE.ORG: UNDER THE SEARCH BAR CLICK 'Advanced Search'; IN THE UPPER 'ADVANCED SEARCH' BLOCK: SEARCH: # AND Collection: is GratefulDead <--must type exactly this # HIT SEARCH; THIS REDIRECTS TO SEARCH RESULTS WITH SYNTAX IN SEARCH FIELD: collection:(GratefulDead) # COPY SEARCH SYNTAX AND GO BACK TO ADVANCED SEARCH PAGE # # IN LOWER 'Advanced Search returning JSON, XML, and more' BLOCK: PASTE: collection:(GratefulDead) INTO 'Query' FIELD # SELECT 'identifier' IN 'Fields to return' list # CHANGE 'NUMBER OF RESULTS' TO 15000 # SELECT 'CSV FORMAT' # HIT SEARCH # HIT 'OK' ON POP-UP NOTE # OPEN CSV FILE, DELETE 'IDENTIFIER' COLUMN HEADER # SAVE, CLOSE # SET PATH TO CSV FILE (HARDCODE IN .PY FILE) # SET PATH TO LOCAL DIRECTORY TO SAVE FILES (HARDCODE IN .PY FILE) # RUN SELECTED .PY FILE FROM COMMAND LINE # OPTION 1: DOWNLOAD ALL 14000 GRATEFUL DEAD SHOWS AT ONCE: download_all_gd.py # OPTION 2: SEGMENT AND DOWNLOAD SHOWS BY YEAR: download_gd_by_year.py # NOTE TO SELF: COLLECTION NAME FOR DEAD & COMPANY: collection:(DeadAndCompany) # NOTE TO ALL: YOU CAN FIND THESE 'COLLECTION NAMES' ON ARCHIVE, YOU JUST GOTTA POKE AROUND # STATS FROM SEGMENTATION AND DOWNLOAD BY YEAR BASED ON 1975: # INTERNET CONNECTION DETAILS: # DOWNLOAD Mbps: 887.41 # UPLOAD Mbps: 839.64 # https://www.speedtest.net/ # 1975: 55 shows # time to download all 1975: 81.11 minutes # download time per show: 1.47 minutes # total size of 1975: 6.35 GB # average file size per show: 115.45 MB # Estimate regarding all 14000 shows based on 1975 stats: # Approx: 1.62 TB TOTAL # Approx: 344 Total hours to download (14 days!) # MILEAGE WILL VARY # BREAKDOWN: NUMBER OF SHOWS BY YEAR (as of 11/20/2019): # 65: 3 shows # 66: 69 shows # 67: 48 shows # 68: 113 shows # 69: 291 shows # 70: 340 shows # 71: 307 shows # 72: 325 shows # 73: 379 shows # 74: 270 shows # 75: 55 shows # 76: 251 shows # 77: 352 shows # 78: 462 shows # 79: 453 shows # 80: 613 shows # 81: 638 shows # 82: 524 shows # 83: 812 shows # 84: 688 shows # 85: 874 shows # 86: 497 shows # 87: 772 shows # 88: 684 shows # 89: 854 shows # 90: 933 shows # 91: 682 shows # 92: 406 shows # 93: 576 shows # 94: 412 shows # 95: 315 shows # gdnrps: 2 shows # Total: 14000 shows # TESTING SOME FUNCTIONALITY IN THE internetarchive API #----------------------------------------------------------------------------------------------------------------------- # TESTING: # DOWNLOADABLE SHOW #THIS IS THE IDENTIFIER #download('gd1977-11-05.aud.zimmerman.minches.81180.sbeok.flac16', verbose=True, glob_pattern='*.mp3') # SUCCESS #----------------------------------------------------------------------------------------------------------------------- # TESTING: # STREAM-ONLY SHOW #THIS IS THE IDENTIFIER #download('gd73-06-10.sbd.hollister.174.sbeok.shnf', verbose=True, glob_pattern='*.mp3', destdir=r"H:\gd") # SUCCESS #-----------------------------------------------------------------------------------------------------------------------
About
Download all 14000 Grateful Dead shows from Archive.org
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published