Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cc_tool builds src for wrong platform #299

Open
anguslees opened this issue Jan 6, 2025 · 15 comments
Open

cc_tool builds src for wrong platform #299

anguslees opened this issue Jan 6, 2025 · 15 comments
Assignees

Comments

@anguslees
Copy link

anguslees commented Jan 6, 2025

cfg = "exec",

I think cc_tool(src) should be cfg=target, not cfg=exec.

Explanation: The cc_tool rule does not (itself) execute the tool as a build action (no ctx.actions.run). Presumably there will eventually be some other rule that uses the output from cc_tool as an attribute, and (importantly) also invokes the tool. That rule should consume the tool via an attribute with cfg=exec, so the tool transition can evaluate all the constraints correctly for that rule.

@pzembrod
Copy link
Collaborator

I don't think I quite understand the reasoning in the explanation. Why should a cc_tool() not run on the exec platform?

@matts1 fyi

@armandomontanez
Copy link
Collaborator

There were some aspects of the initial design that warranted putting cfg=exec there, but kind of agree with you. We should probably put cfg=exec at the cc_toolchain.tool_map attribute, and remove it from the cc_tool rule.

@matts1
Copy link
Contributor

matts1 commented Jan 13, 2025

tool_map already has a transition to cfg=exec (source).

I think it should be safe to just replace exec with target.

@armandomontanez armandomontanez self-assigned this Jan 24, 2025
@keith
Copy link
Member

keith commented Feb 11, 2025

I think I'm getting hit by this as well but none of the matrix of options discussed here seems to fix the issue immediately. Which I imagine means there's another issue in my setup with rules_foreign_cc. But one thing I noticed that surprised me, is when I have 3 execution platforms + 3 target platforms, all with a single toolchain configured, if I cquery that toolchain this is how I see it boiled down to a specific tool:

Image

In this example I'm on a macOS machine where the macOS + linux remote executors are registered as execution platforms, so 2 of these configurations are macOS (target vs exec) and 1 is linux, but I'm surprised to see here that there is only 1 configuration of the final llvm-ar binary, which is defined as expected with a select(). This makes it look like you will end up exec'ing the same tool regardless of exec platform, which is what ends up happening in my case, failing the build.

@matts1
Copy link
Contributor

matts1 commented Feb 11, 2025

My assumption was that your situation would end up with bazel configuring a seperate set of toolchains for each of your remote execution platforms, so you'd get:

  • macos exec ->macos target
  • linux exec -> macos target

I'm hoping someone knowledgable about platforms can educate me more, but is it possible that since the toolchain itself says it's compatible with all execution platforms, it sees macos exec, creates a toolchain for that, then sees linux exec and sees that the toolchain supports linux and attempts to map linux to the existing toolchain instead of creating another one? I wouldn't be surprised if the original authors of the toolchain logic never considered that someone might make one toolchain for all platforms using select.

@keith
Copy link
Member

keith commented Feb 11, 2025

it looks like you're right, so at least 1 way to make it work is to create 3 separate targets for all_tools that are platform specific, remove all selects from that entire tree, create 3 cc_toolchains and continue having 3 toolchain definitions with only exec_compatible_with set

@fmeum
Copy link
Contributor

fmeum commented Feb 11, 2025

I created an example project to demonstrate the effect that @keith is seeing: https://github.com/fmeum/toolchain-exec-example

This is caused by the toolchain transition, which forces the execution platform of the <lang>_toolchain target that the toolchain target resolves to be that of the target (more generally, of the exec group) for which the toolchain was selected. However, this effect does not extend to the dependencies of the <lang>_toolchain target, which means that you can only select on the exec platform if you immediately apply an exec transition. If you also want to select on the target platform, you have to split that logic into a separate cfg = "target" dependency.

Because of that I think that cc_toolchain may need to provide and somehow merge separate target_toolchain_config and exec_toolchain_config attributes if you want to be able to select on both exec and target platform.

CC @katre @comius

@matts1
Copy link
Contributor

matts1 commented Feb 12, 2025

Could this be fixed in bazel itself? I imagine a solution would be to just not deduplicate toolchains. Would there be a significant performance implication of this?

Or maybe we could simply say that toolchains with no exec_compatible_with / target_compatible_with are not deduplicated?

@fmeum
Copy link
Contributor

fmeum commented Feb 13, 2025

I'm not sure what you mean by deduplication, but I don't think anything like that is happening here.

It's also not obviously a "bug": If you want to select on the target platform in some places and on the exec platform in other places, there needs to be some way to specify that. Currently select applies to the exec platform of a toolchain in a toolchain dependency if and only if exactly the first edge on the path to that dependency has cfg = "exec", which seems somewhat reasonable. Whether that's flexible enough to implement rule-based C++ toolchains is a different story and definitely needs more thought.

As a starter, could you describe which parts of the toolchain config would want to select on the target platform and which ones on the exec platform of the toolchain?

@armandomontanez
Copy link
Collaborator

Hmmm my expectation based on the current implementation is:

cc_toolchain(
    name = "my_multiplatform_toolchain",
    tool_map = ":tool_map_shim",  # a select here is under the "target" config.
)

alias(
    name = "tool_map_shim",
    actual = select({  # Expectation: still selecting on "target" config.
        "@platforms//os:macos": ":tool_map_to_target_macos",
        "//conditions:default": ":default_:tool_map",
    }),
)

cc_tool_map(
    name = "tool_map_to_target_macos",
    tools = {  # select() not possible here for implementation reasons.
        "@rules_cc//cc/toolchains/actions:ar_actions": ":llvm-libtool-darwin",
        # ...
    },
)

cc_tool(
    name = ":llvm-libtool-darwin",
    src = select({  # Expectation: selecting on "exec" config.
        "@platforms//os:linux": "@llvm_linux//:bin/llvm-libtool-darwin",
        "@platforms//os:macos": "@llvm_linux//:bin/llvm-libtool-darwin",
        "@platforms//os:windows": "@llvm_linux//:bin/llvm-libtool-darwin",
    }),
)

I know that in this example the cc_tool's select() ends up under the exec configuration, I relied on that for Windows support in the Pico SDK: https://github.com/raspberrypi/pico-sdk/blob/95ea6acad131124694cda1c162c52cd30e0aece0/bazel/toolchain/gcc_arm_none_eabi.BUILD#L34. But also, I haven't tried that in an environment where there's a pool of remote executors with a variety of operating systems.

This may be a long shot, but try registering the cc_toolchain with multiple exec-specific toolchains:

toolchain(
    name = "cc_toolchain_linux",
    exec_compatible_with = [
        "@platforms//os:linux",
    ],
    toolchain = ":my_multiplatform_toolchain",
    toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
)

toolchain(
    name = "cc_toolchain_macos",
    exec_compatible_with = [
        "@platforms//os:macos",
    ],
    toolchain = ":my_multiplatform_toolchain",
    toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
)

I'm suspicious that having a single toolchain is allowing Bazel to evaluate the toolchain once and assume it evaluates the same for every exec platform.

@fmeum
Copy link
Contributor

fmeum commented Feb 13, 2025

This setup can't work correctly with multiple exec platforms since the exec platform of tool_map_to_target_macos is what is used in the exec transition, but that rule doesn't specify any toolchain requirements or exec constraints and thus uses the first registered exec platform, which may be entirely unrelated to the exec platform chosen for the toolchain.

The example I linked above shows how a single toolchain even without constraints can be selected with different exec platforms. But this requires the cfg = "exec" attribute to be right on the <lang>_toolchain target, not somewhere in its dependency closure.

@katre
Copy link
Member

katre commented Feb 14, 2025

Quick summary of the current situation with toolchains: when a target (let's say a cc_library, for convenience) requires a specific toolchain type (@rules_cc//cc:toolchain_type, in this case) for a specific exec group (let's imagine it's cc_compile, but the default exec group works the same), bazel goes through the steps in toolchain resolution, and determines the following:

  1. A single execution platform for the "cc_compile" exec group, that can execute all needed toolchain types.
  2. A toolchain implementation for each toolchain type, that can target the target platform (from the configuration) and the exec platform.

Bazel then creates a toolchain dependency from the target (cc_library) to the toolchain implementation (cc_toolchain), and populates the data structures, so that the rule implementation can find the toolchain implementation's ToolchainInfo provider (via ctx.exec_groups["cc_compile"].toolchains["@rules_cc//cc:toolchain_type"]) and get the implementation-specific provider (CcToolchainInfo, I think).

The toolchain dependency has a few special properties that normal dependencies don't:

  1. The toolchain implementation has the same configuration as the original target (no actual transition happens)
  2. The original exec group's execution platform is passed along, and the toolchain implementation is forced to use that exec platform
  3. The toolchain implementation should not create any actions, because things can blow up.

However, there is nothing special about the toolchain implemention's own dependencies: these do not keep the forced execution platform, and perform standard toolchain resolution.

These properties have the following consequences:

  1. Exec groups are less useful for toolchain implementations (like cc_toolchain), because every exec group will have the same exec platform
    1. But maybe they are still useful to hang exec properties on
  2. Toolchain implementations should not themselves require toolchain types
    1. Toolchains are useful to create actions, and toolchain implementations cannot create actions
  3. Because of the forced execution platform, a toolchain implementation can use cfg = "exec" to depend on a tool that the parent rule can safely execute on its execution platform
    1. Because that tool will still see standard toolchain resolution, and its target platform will be the exec platform for the rule that uses it, it is built following standard semantics
    2. If a toolchain implementation depends on a tool, and that tool's implementation uses cfg = "exec", that refers to the exec platform for building the tool, and not the exec platform for the original cc_library target that required the toolchain
  4. Even in a build with multiple execution platforms, each toolchain implementation will only see a single execution platform
    1. This is a limitation in the original design. Is it possible to envision a toolchain that uses multiple execution groups, and thus multiple execution platforms? Yes, but the API would be very complex to express which tool in the toolchain is for which exec group in the original rule, and so today it isn't supported.

Okay, that was a lot, finishing this comment and starting a new one.

@katre
Copy link
Member

katre commented Feb 14, 2025

Back to this specific request:

If I understand the setup correctly, we have the following targets in play:

# A hypothetical target we are trying to build that requires the CC toolchain type.
cc_library(name = "foo")

# Dropping all irrelevant details, pretend this is correct.
toolchain(
    name = "cc_toolchain",
    toolchain_type = "@rules_cc//cc:toolchain_type",
    target_compatible_with = ["@platforms//os:linux", "@platforms//cpu:x86_64",]
    exec_compatible_with = ["@platforms//os:linux", "@platforms//cpu:x86_64",]
    toolchain = ":clang",
)

cc_toolchain(
    name = "clang",
    tool_map = ":all_tools",
    ... other attributes ...
)

cc_tool_map(
    name = "all_tools",
    tools = {
        "@rules_cc//cc/toolchains/actions:c_compile": ":clang",
    },
    ... other attributes ...
)

cc_tool(
    name = "clang",
    srcs = ["@clang//:clang_binary"],
    ... other attributes ...    
)

(Is this correct: it looks like some of these are macros, which will confuse the analysis, but the principles are the same)

In this case, we have the following configured targets if we run bazel build --platforms=//platforms:linux, and if there are execution platforms //platforms:remote_linux and //platforms:remote_mac.

  1. //:foo: target platform is //platforms:linux
    1. The rule requires the toolchain type @rules_cc/cc:toolchain_type
    2. Bazel finds the //:cc_toolchain registration, so sets the execution platform to //platforms:remote_linux and chooses the //:clang toolchain implementation
    3. Bazel creates a toolchain dependency from //:foo to //:clang, keeping the configuration the same.
  2. //:clang: target platform is //platforms:linux, exec platform is forced to be //platforms:linux_remote
    1. Assuming that the tool_map attribute uses cfg = "exec", the dependency on //:all_tools uses the exec transition, which changes the target platform for the dependency to the exec platform of the parent.
  3. //:all_tools: target platform //platforms:linux_remote
    1. Toolchain resolution proceeds normally, there is no forced execution platform.
    2. The tools attribute should not use cfg = "exec", it should use cfg = "target" (or just not set cfg, "target" is the default).
    3. Why? Because we want the tools to target //platforms:linux_remote: that is where the actions to generate //:foo will be executed.
  4. Similarly, //:clang should also have target platform //platforms_linux_remote
    1. And, by the same analysis, the srcs attribute should not use the exec transition, because the tool needs to be able to execute on //platforms:linux_remote, which is the target platform for //:clang.

So, ending the analysis, the initial comment is correct, cc_tool should *not use cfg = "exec", and probably the entire cc_toolchain implementation needs to be checked to be sure that tools have the correct configuration.

@armandomontanez
Copy link
Collaborator

Thanks for the excellent writeup @katre, I'm going to bookmark this! 😁 I spent some time playing @fmeum's example and I think I better understand the nuances of this problem.

Long story short: the current design doesn't support cc_toolchain definitions shared by multiple exec platforms. It properly orders exec transitions if and only if all registered execution platforms evaluate the toolchain configuration identically.

Problem: no cfg = "exec" for any attributes that works with multiple exec platforms

The biggest fundamental problem is that rule-based toolchains are implemented in terms of native.cc_toolchain() which currently lives inside of Bazel. This means that when there are multiple registered exec platforms, there's no way to do a correct "exec" transition at any level because none of the attributes are under the "exec" config.

This is a very unfortunate oversight. 😔

This was missed because things work just fine when you only have a single registered execution platform at a time.

Problem: No way to do a delayed target->exec transition

The intended behavior of cc_toolchain.tool_map assumed the incoming exec configuration is inherited by transitive dependencies of the cc_toolchain rule, which is conceptually what would have allowed selection of the target-specific tool map to occur before a transition to the exec platform.

Conceptually, I think we'd need something like this:

cc_toolchain = rule(
    implementation = _cc_toolchain_impl,
    attrs = {
        "tool_map": attr.label(with_same_exec_requirements = True),
    },
)

Which would carry this toolchain property forward:

  1. The original exec group's execution platform is passed along, and the toolchain implementation is forced to use that exec platform

But I'm suspicious that implementing that would be a nightmare. This problem seems like it would be relevant for any rules that don't immediately use a tool that they specify as a cfg = "exec" dependency, but instead forward to be used by another rule.

There's probably less invasive paths to pursue, but I'd need to spend more time thinking about it.

Near-term solution

I suspect the only real path forward without significant changes to Bazel is to update the documentation to clarify that every cc_toolchain may target many platforms, but it may only execute on one platform. The example toolchain will also need to be updated so it matches this limitation.

I think right now the expectation for cfg = ??? would be that we'd need to expect cfg = "target" everywhere in the rule-based toolchain. This doesn't make a ton of sense for things like the toolchain binaries (which you'd expect to be behind a cfg = "exec" transition), but is what we're limited to for now.

@matts1
Copy link
Contributor

matts1 commented Feb 24, 2025

I was able to come up with a pretty reasonable solution for rules_toolchains that would be just as applicable here.

It's similar to the approach that @keith got working, except it supports selecting over the exec platform by manually specifying a transition to a specific platform which is the actual exec platform, rather than the one in "exec", which as we've determined, is not the exec platform.

To achieve this, we need to manually enumerate all supported exec platforms, but that's something we were doing anyway. I personally think it turned out pretty clean. If we're able to get bazelbuild/bazel#25363 approved, we could even just write toolchain(exec_platforms = ["//:remote_execution_platform1", ":remote_execution_platform2"])

See matts1/rules_toolchains@64e9ec8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants