Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add unit test covering the case where worker streams are stopped early #2127

Merged
merged 4 commits into from
Feb 6, 2025

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Feb 4, 2025

This PR is a follow-up to #2034 with the purpose of testing the error condition.

Note: I have confirmed that this test fails without the changes from #2034. Actually, it hangs forever because the child threads aren't shutdown but the assertions do fail when I step through with a debugger.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #2032 🦕

@tswast tswast requested review from a team as code owners February 4, 2025 17:09
@tswast tswast requested a review from PhongChuong February 4, 2025 17:09
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Feb 4, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 4, 2025
@tswast
Copy link
Contributor Author

tswast commented Feb 4, 2025

Looking at the test failures, there's a chance that

FAILED samples/tests/test_download_public_data.py::test_download_public_data
FAILED samples/tests/test_download_public_data_sandbox.py::test_download_public_data_sandbox

are related to this change. I'll give them a try locally.

Edit: They are passing locally (and in other versions of python). Perhaps it was just a temporary issue with the bigquery-public-data.usa_names.usa_1910_current table?

@tswast
Copy link
Contributor Author

tswast commented Feb 4, 2025

I see test__download_table_bqstorage_shuts_down_workers fails on Python 3.7. I'll update the test to be compatible with older pyarrow versions.

Edit: I made the change and confirmed that Python 3.7 unit tests now pass locally.

(venv) ➜  python-bigquery git:(issue2032-unit-tests) nox -s unit-3.7
nox > Running session unit-3.7
nox > Creating virtual environment (virtualenv) using python3.7 in .nox/unit-3-7
...
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2487 passed, 14 skipped, 249 warnings in 37.29s
Session ran in 79 seconds (0:01:19)
nox > Session unit-3.7 was successful.```

@tswast tswast requested a review from a team as a code owner February 5, 2025 15:47
@tswast tswast requested a review from engelke February 5, 2025 15:47
@tswast
Copy link
Contributor Author

tswast commented Feb 5, 2025

I took another look at the failing snippet tests and still can't figure out what might be causing these failures, as it is passing locally.

The test verifies that the data was downloaded but then fails at the assertion that is looking for this log:

"Started reading table '{}.{}.{}' with BQ Storage API session '{}'.".format(

Hypotheses:

  • Maybe we are hitting some "low latency query path" things on the GCP project associated with our Kokoro tests that aren't on the project I'm using locally?
  • Maybe something is different across the versions of pytest, where pytest.LogCaptureFixture.messages isn't working as it used to?

Either way, I think so long as the data is downloaded, I'm 80% comfortable removing this check in e3315a8 We have other tests for when the BQ Storage API kicks in.

@tswast tswast requested review from chalmerlowe and Linchin and removed request for engelke and PhongChuong February 5, 2025 16:38
Copy link
Contributor

@Linchin Linchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for reviewing the PR and adding the tests! Since we are deleting the part verifying the log, I wonder if it would be useful to add something to check that a storage read stream was indeed created? (maybe it's not necessary because the sample already set create_bqstorage_client=True)

@tswast tswast merged commit 5e7d5ed into main Feb 6, 2025
19 of 25 checks passed
@tswast tswast deleted the issue2032-unit-tests branch February 6, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock due to race condition in download thread when using Bigquery Storage API
3 participants