Add ThreadPoolExecutor for AssetDaemon run submission #25888

OwenKephart · 2024-11-12T22:48:42Z

Summary & Motivation

As title

How I Tested These Changes

Changelog

NOCHANGELOG

OwenKephart · 2024-11-12T22:48:59Z

Add ThreadPoolExecutor for AssetDaemon run submission #25888 👈
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @OwenKephart and the rest of your teammates on Graphite

gibsondan

this all looks very reasonable to me. The one thing I am unsure about is whether we need some kind of lock on run_request_execution_data_cache now that it is being written to and read from from multiple threads simultaneously potentially? Is there anything else that might be shared between the threads that could also run into that issue?

gibsondan · 2024-11-13T19:09:20Z

python_modules/dagster/dagster/_daemon/asset_daemon.py

@@ -384,6 +389,14 @@ def core_loop(
                        thread_name_prefix="asset_daemon_worker",
                    )
                )
+                num_submit_workers = self._settings.get("num_submit_workers")


there will need to be a corresponding internal PR for this i imagine?

https://github.com/dagster-io/internal/pull/12675/files yep!

gibsondan · 2024-11-13T19:10:18Z

python_modules/dagster/dagster/_daemon/asset_daemon.py

+                run_request_index=i,
+                instance=instance,
+                workspace_process_context=workspace_process_context,
+                run_request_execution_data_cache=run_request_execution_data_cache,


is this cache thread safe?

my understanding here is that the risk with this cache is simply that multiple threads will end up making redundant queries for execution data.

in theory, the values returned by these separate queries should be identical, and there's no read -> modify -> write sort of pattern happening here, which is where the non-thread-safety of dictionaries would come into play. Given that all writes into the same key of the dictionary should theoretically contain the same value, that shouldn't be an issue.

gibsondan · 2024-11-13T19:14:32Z

...tests/definitions_tests/declarative_automation_tests/scenario_utils/asset_daemon_scenario.py

@@ -212,6 +212,7 @@ def _evaluate_tick_daemon(
                    threadpool_executor=self.threadpool_executor,
                    amp_tick_futures=amp_tick_futures,
                    debug_crash_flags={},
+                    submit_threadpool_executor=None,


i don't know that we need to be running all these scenarios with the submit threadpool executor at all times, but at least as a one-off i think it would be useful to do it once manually to suss out any potential problems

Is the concern here that the behavior of AMPs might differ in some way from the behavior of AutomationConditions?

Generally, I want to move away from this "scenario" pattern, as I've found that the test_e2e.py (and the new test file I added in this PR) are a better representation of "real world" conditions. So to that end, I did update one of the e2e tests to use a submit threadpool executor.

Down to add an e2e test that uses AMPs, just want to avoid extending the scenario framework at this point, as my eventual goal is to delete all of it

responded!

OwenKephart requested a review from gibsondan November 12, 2024 22:48

OwenKephart marked this pull request as ready for review November 12, 2024 23:58

OwenKephart force-pushed the 11-12-add_threadpoolexecutor_for_assetdaemon_run_submission branch 3 times, most recently from 16f35f2 to 4eaf79e Compare November 13, 2024 18:20

gibsondan previously requested changes Nov 13, 2024

View reviewed changes

Add ThreadPoolExecutor for AssetDaemon run submission

a468a34

OwenKephart force-pushed the 11-12-add_threadpoolexecutor_for_assetdaemon_run_submission branch from 4eaf79e to a468a34 Compare November 13, 2024 21:42

OwenKephart requested a review from gibsondan November 13, 2024 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ThreadPoolExecutor for AssetDaemon run submission #25888

Add ThreadPoolExecutor for AssetDaemon run submission #25888

OwenKephart commented Nov 12, 2024 •

edited

Loading

OwenKephart commented Nov 12, 2024

gibsondan left a comment

gibsondan Nov 13, 2024

OwenKephart Nov 13, 2024

gibsondan Nov 13, 2024

OwenKephart Nov 13, 2024

gibsondan Nov 13, 2024

OwenKephart Nov 13, 2024

Add ThreadPoolExecutor for AssetDaemon run submission #25888

Are you sure you want to change the base?

Add ThreadPoolExecutor for AssetDaemon run submission #25888

Conversation

OwenKephart commented Nov 12, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

Changelog

OwenKephart commented Nov 12, 2024

gibsondan left a comment

Choose a reason for hiding this comment

gibsondan Nov 13, 2024

Choose a reason for hiding this comment

OwenKephart Nov 13, 2024

Choose a reason for hiding this comment

gibsondan Nov 13, 2024

Choose a reason for hiding this comment

OwenKephart Nov 13, 2024

Choose a reason for hiding this comment

gibsondan Nov 13, 2024

Choose a reason for hiding this comment

OwenKephart Nov 13, 2024

Choose a reason for hiding this comment

OwenKephart commented Nov 12, 2024 •

edited

Loading