-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update for latest ADX version #2
Conversation
Fixes dsdk version and downloading sample data. |
"- Expand the \"Data set overview\" box near the top to reveal the Data set ID.\n", | ||
"\n", | ||
"_**Replace the `dataset_id` parameter below with your particular ID then run the notebook.**_\n", | ||
"The use license will be in your `PURESKILLGG_TOME_DS_COLLECTION_PATH` named `license.pdf`. You must agree to these terms to use the data.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The use license will be in your `PURESKILLGG_TOME_DS_COLLECTION_PATH` named `license.pdf`. You must agree to these terms to use the data.\n", | |
"The use license will be in your `PURESKILLGG_TOME_DS_COLLECTION_PATH` named `LICENSE.pdf`. You must agree to these terms to use the data.\n", |
Same for README.md in the zip file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix'd
@@ -89,23 +69,36 @@ | |||
"outputs": [], | |||
"source": [ | |||
"import os\n", | |||
"from pureskillgg_dsdk.exchange import download_dataexchange_dataset_revision" | |||
"import requests\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, when I looked up how to do this I was suggested to use import urllib
and urllib.request
. Is requests
also in std lib? Two ways to do the same thing in Python? 😲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my google search led me to this and i didn't look more into it haha https://www.codingem.com/python-download-file-from-url/
Requests is mentioned in the top yellow box here https://docs.python.org/3/library/urllib.request.html#module-urllib.request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, apparently I used the legacy interface: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve at https://github.com/pureskillgg/dsdk/blob/6f7dcc064e6a4778fc268e2eb0c191fbd5f6d965/pureskillgg_dsdk/exchange/dataset.py#L75
Anyway, we will need to add this with poetry yea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮 yup. done.
"download_dataexchange_dataset_revision(output_path, dataset_id)" | ||
" response = requests.get(url)\n", | ||
" with open(output_filename, \"wb\") as f:\n", | ||
" f.write(response.content)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just skip the part where we save the zip file to disk: https://stackoverflow.com/a/2463819
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Though now I can't check if the zip exists to see if we've already downloaded it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a small amount of data I'm not sure it matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the way you've written this, if there was an error extracting the zip file and they reran the loop it would still assume the zip file was extracted correctly. So I recommend just not making this one "smart"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old one took 20 minutes to download, this one takes less than 10 seconds so it is trivial to overwrite, i guess :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix'd
No description provided.