Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is a transient error in http download_and_extract not retried? #23687

Open
sthornington opened this issue Sep 19, 2024 · 0 comments
Open
Labels
team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: bug untriaged

Comments

@sthornington
Copy link

sthornington commented Sep 19, 2024

Description of the bug:

Downloading artefects using http_archive, such as the rust_rules do for downloading cargo crates, can sometimes run into issues where it fails to delete a temporary directory because it's not yet empty:

INFO: Repository crate_index__fastrand-2.1.1 instantiated at:
  /scratch/simont/src/quadcap/WORKSPACE:122:19: in <toplevel>
  /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index/defs.bzl:579:10: in crate_repositories
  /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/bazel_tools/tools/build_defs/repo/utils.bzl:268:18: in maybe
Repository rule http_archive defined at:
  /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/bazel_tools/tools/build_defs/repo/http.bzl:382:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'crate_index__fastrand-2.1.1':
   Traceback (most recent call last):
        File "/stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/bazel_tools/tools/build_defs/repo/http.bzl", line 131, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Couldn't delete temporary directory (/stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560): /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560 (Directory not empty)
ERROR: no such package '@@crate_index__fastrand-2.1.1//': java.io.IOException: Couldn't delete temporary directory (/stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560): /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560 (Directory not empty)
ERROR: /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__tempfile-3.12.0/BUILD.bazel:16:13: @@crate_index__tempfile-3.12.0//:tempfile depends on @@crate_index__fastrand-2.1.1//:fastrand in repository @@crate_index__fastrand-2.1.1 which failed to fetch. no such package '@@crate_index__fastrand-2.1.1//': java.io.IOException: Couldn't delete temporary directory (/stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560): /stuff/simont/bazel_base/4fd9a0903ffbf1e501ab97de5c381dcc/external/crate_index__fastrand-2.1.1/temp1055349778882693560 (Directory not empty)
Use --verbose_failures to see the command lines of failed build steps.
ERROR: Analysis of target '//rust/qcrs_link_demo:qcrs_link_demo' failed; build aborted: Analysis failed
INFO: Elapsed time: 28.348s, Critical Path: 0.91s
INFO: 269 processes: 108 remote cache hit, 161 internal.
ERROR: Build did NOT complete successfully

This error seems to originate from

and if one's filesystem does this, there does not seem to be any possible mitigation? Perhaps temporary downloads could be downloaded to a real temporary directory, not part for the output_base, before being emplaced in the final spot?

Which category does this issue belong to?

Starlark Interpreter

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Build a rust project with lots of crate dependencies and an output_user_root on a filesystem that does not guarantee atomic delete/unlink visibility.

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 7.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

#20013 seems like the same problem - it just seems to me that exceptions marked TRANSIENT which are to do with cleaning up things like temporary scratch directories should be retried instead of killing the entire build.

Any other information, logs, or outputs that you want to share?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: bug untriaged
Projects
None yet
Development

No branches or pull requests

4 participants