Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Signals #1043

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Async Signals #1043

wants to merge 3 commits into from

Conversation

TitanNano
Copy link
Contributor

@TitanNano TitanNano commented Feb 9, 2025

This has been developed last year in #261 and consists of two somewhat independent parts:

  • A Future for Signal: an implementation of the Future trait for Godots signals.
  • Async runtime for Godot: a wrapper around Godots deferred code execution that acts as a async runtime for rust futures.

The SignalFuture does not depend on the async runtime and vice versa, but there is no point in having a future without a way to execute it.

For limitations see: #261 (comment)

Example

let node = Node::new_gd();

// spawn a new async task
godot_task(async move {
    // do something before waiting for a signal
    let children = node.get_children();
    
    // await a signal
    let _: () = Signal::from_object_signal(&node, "tree_entered").to_future().await;

    // do more after the signal
   children.iter_shared().for_each(|child| ... );
});

TODOs

  • Decide if we want to keep the GuaranteedSignalFuture. Should it be the default? (We keep it as TrySignalFuture, the plain signal is a wrapper that panics in the error case.)
  • Documentation
  • figure out async testing.
  • deal with async panics (in tests)

CC @jrb0001 because they provided very valuable feedback while refining the POC.
Closes #261

@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1043

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is very cool!

From the title I was first worried this might cause many conflicts with #1000, but it seems like it's mostly orthogonal, which is nice 🙂

I have only seen the first 1-2 files, will review more at a later point. Is there maybe an example, or should we just check tests?

@TitanNano TitanNano mentioned this pull request Feb 10, 2025
@TitanNano TitanNano force-pushed the jovan/async_rt branch 3 times, most recently from 2877010 to 9687f3b Compare February 10, 2025 21:19
@Bromeon Bromeon added the feature Adds functionality to the library label Feb 10, 2025
@jrb0001
Copy link
Contributor

jrb0001 commented Feb 10, 2025

I am currently testing it with my project.

  • Executor from this PR and signals from my old implementation (based on async_channel) seems to work ingame.
  • My old executor (based on async_task running once per frame) and signals from this PR is the next step, hopefully tomorrow.
  • Both Executor and signals from this PR will come after that. I expect some issues with recursive signals but let's see.
  • I am getting a weird segfault on hotreloading with a completely useless backtrace which didn't happen with my executor implementation. I need to debug this more, but I suspect it is related to having a tool node spawn a future which listens on its signals and/or a signal>drop>signal>drop something else>signal chain.

@lilizoey
Copy link
Member

* I am getting a weird segfault on hotreloading with a completely useless backtrace which didn't happen with my executor implementation. I need to debug this more, but I suspect it is related to having a tool node spawn a future which listens on its signals and/or a signal>drop>signal>drop something else>signal chain.

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

@TitanNano
Copy link
Contributor Author

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

Shouldn't the hot-reload hack only leak memory? 🤔

@jrb0001 does the segfault occur on every hot-reload?

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 11, 2025

i'd guess it's related to using thread_local here which we need to do some hacky stuff to support with hot-reloading enabled

Shouldn't the hot-reload hack only leak memory? 🤔

@jrb0001 does the segfault occur on every hot-reload?

I am not completely sure yet. It doesn't happen if there are no open scenes or if none of them contains a node which spawns a Future.

It also doesn't seem to happen every single time if I close all scenes and then open one with a Future before triggering the hot-reload. In this case it panics with some scenes:

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]  Future no longer exists when waking it! This is a bug!
  at /home/jrb0001/.cargo/git/checkouts/gdext-3ec94bd991a90eb6/2877010/godot-core/src/builtin/async_runtime.rs:271

With another scene it segfaults in this scenario.

Simply reopening the editor (same scene gets opened automatically) and then triggering a hot-reload segfaults for both scenes.

With both executor + Future from this PR, the hot-reload issue doesn't happen at all?!? So the issue could also be in my code, let me debug it properly before you waste more time on it.

I will do some more debugging later this week (probably weekend).


I also finished testing the Future part of the PR and it works fine with both my old executor and your executor in my relatively simple usage.

Unfortunately all my complex usages (recursion, dropping, etc.) need a futures_lite::Stream which I can't implement on top of your GuaranteedSignalFuture without potentially missing (or duplicating?) some signals while reconnecting with a new Future instance.

The R: Debug bound on to_future()/to_guaranteed_future() was a bit annoying and doesn't seem to be used? Or did I miss something?

@TitanNano
Copy link
Contributor Author

The R: Debug bound on to_future()/to_guaranteed_future() was a bit annoying and doesn't seem to be used? Or did I miss something?

Yeah, it's completely unnecessary now. Probably an old artifact. I removed the bound.


Unfortunately all my complex usages (recursion, dropping, etc.) need a futures_lite::Stream which I can't implement on top of your GuaranteedSignalFuture without potentially missing (or duplicating?) some signals while reconnecting with a new Future instance.

Can you elaborate what the issue here is?


I'm also curious what your use-case for the GuaranteedSignalFuture is. Currently, I'm still thinking to get rid of it again. I have never come across a future that resolves when the underlying source disappears, and I wonder if it is really that useful for most users. But maybe you can share how it's important for you.

@TitanNano
Copy link
Contributor Author

TitanNano commented Feb 12, 2025

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]  Future no longer exists when waking it! This is a bug!
  at /home/jrb0001/.cargo/git/checkouts/gdext-3ec94bd991a90eb6/2877010/godot-core/src/builtin/async_runtime.rs:271

@jrb0001 Do you have an idea what could have triggered this? The only thing that I can think of is that a waker got cloned and reused after the future resolved. The panic probably doesn't make any sense, since the waker can technically be called an infinite number of times. 🤔

@TitanNano TitanNano force-pushed the jovan/async_rt branch 2 times, most recently from 071c97e to c58b657 Compare February 14, 2025 23:47
@TitanNano
Copy link
Contributor Author

@Bromeon I now added a way to test async tasks. I still need to deal with panics inside a Future, though. Technically, we could unify the test execution of sync and async tasks, but I get the impression that it also would have some downsides. Keeping it separate adds a bit of duplication, but unifying it would force more complexity onto the execution of sync tasks.

Copy link
Member

@Bromeon Bromeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally had some time to look more closely at this. Thanks so much for this great PR, outstanding work as always ❤️

Technically, we could unify the test execution of sync and async tasks, but I get the impression that it also would have some downsides. Keeping it separate adds a bit of duplication, but unifying it would force more complexity onto the execution of sync tasks.

I think you made the right choice here, it seems they're different enough to be treated differently. If it becomes bothersome in the future, we could always revise that decision; but I think keeping the sync tests simple is a good approach.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 2 times, most recently from 20a53b7 to af7d58b Compare February 15, 2025 14:32
@jrb0001
Copy link
Contributor

jrb0001 commented Feb 16, 2025

I'm also curious what your use-case for the GuaranteedSignalFuture is. Currently, I'm still thinking to get rid of it again. I have never come across a future that resolves when the underlying source disappears, and I wonder if it is really that useful for most users. But maybe you can share how it's important for you.

My experience seems to be the exact opposite of yours. Usually things like sockets and channels return Err/None/panic when the other side disappears. I don't think I have ever encountered a Future that gets stuck intentionally.

With Godot this isn't only caused by intentionally disconnecting a signal, but also when a node is freed, which can happen at any time and on a large scale. I don't like the idea of having hundreds or maybe even thousands of stuck tasks after the player changed scenes a few times.

I also think we shouldn't compare it to gdscript, for two reasons:

  • gdscript doesn't need to store any additional state so it doesn't have a memory leak. Your runtime "leaks" memory through the thread-local if a task gets stuck.
  • Not sure how to explain this, but for me the direction behind them is different. gdscript (and rust with Callable) is "Godot should call this method when ..." (Godot is the owner / pushing) while Future is "My future should wait until ..." (Future/Runtime is the owner / pulling). The Callable approach can detect the disconnect through NOTIFICATION_PREDELETE (gdscript) or drop() (Rust Callable) while the latter completely depends on the behavior of the signal future.

Your SignalFuture is usually enough and more ergonomic than the GuaranteedSignalFuture but I would make it panic on disconnect and make the Runtime clear the task on panic. The GuaranteedSignalFuture is still helpful if you need to wait for some signal and detect when the source disappears at the same time, without combining multiple signals, relying on catch_unwind() or a custom Drop impl.

I unfortunately didn't get to do my debugging session due to sickness. I will let you know once I have some results, but that will most likely be towards the end of the week or even weekend.

@Bromeon
Copy link
Member

Bromeon commented Feb 16, 2025

Thanks a lot for the detailed insights, @jrb0001 👍

I'm trying to see it from a user perspective. A user would then have to make a choice whether the basic future is enough or the guaranteed one is needed, which may be... not a great abstraction?

How would you advise a library user to choose correctly here, without needing to know all the details? Does the choice even make sense, or should we sacrifice a bit of ergonomics for correctness?

@TitanNano
Copy link
Contributor Author

My experience seems to be the exact opposite of yours. Usually things like sockets and channels return Err/None/panic when the other side disappears. I don't think I have ever encountered a Future that gets stuck intentionally.

I get this point, but I wouldn't say the future gets stuck intentionally. If you create a Godot Object and don't free it, then it leaks memory. That is also not intentional. From my point of view, async tasks must be stored and canceled before freeing the Object, this is simply an inherited requirement from the manually managed Node / Object. We can put this into the documentation of the TaskHandle. Maybe we also want to make the TaskHandle #[must_use]?

I also think making the SignalFuture panic if it's Callable gets dropped would be a good compromise. This would highlight that something unexpected is happening.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 3 times, most recently from 43b167c to 766bc95 Compare February 16, 2025 23:03
@Dheatly23
Copy link
Contributor

Dheatly23 commented Feb 17, 2025

I get this point, but I wouldn't say the future gets stuck intentionally. If you create a Godot Object and don't free it, then it leaks memory. That is also not intentional. From my point of view, async tasks must be stored and canceled before freeing the Object, this is simply an inherited requirement from the manually managed Node / Object. We can put this into the documentation of the TaskHandle. Maybe we also want to make the TaskHandle #[must_use]?

But isn't manually cancelling TaskHandle is too much of a chore? Consider this simple GDScript example:

extends Button

func _pressed():
    await get_tree().create_timer(1.0).timeout
    print("Pressed one second before!")

If the button got freed, the call simply drops without any cleanup code. But with your proposal we need to store all TaskHandle in the node and cancel them all on exit tree, am i right?

Small nitpick, but i disagree on naming it GuaranteedSignalFuture, it give impression that the future will resolve without errors. I suggest naming it TrySignalFuture to emphasize that the signal might never resolve (eg. the node is removed). My potential use case is for asynchronous task cleanup like sending final message or waiting/selecting on multiple signals.

@Bromeon
Copy link
Member

Bromeon commented Feb 17, 2025

From the discussion, it's stated that the "guaranteed" future is less ergonomic to use than the regular one. At the same time, it seems like the regular one needs manual cleanup (thus being less ergonomic in its own way).

To be on the same page, could someone post similar usage examples for each of them? 🙂

@coder137
Copy link

coder137 commented Feb 17, 2025

@TitanNano @Bromeon

Was going through the PR since it was posted in the discord channel.
Just sharing my thoughts

  1. Seems like we are creating a custom async runtime, could this be replaced by https://github.com/smol-rs/async-task?
  2. Simplified version of to_future API it could be something like:
fn to_future<R>() -> impl Future<Output = Option<R>> + 'static {
    // Since we have a FnMut requirement we cannot use oneshot channels here
    // tokio channels are just an example here (we can use any channel that gives us sync tx and async rx)
    let (tx, rx) = tokio::sync::mpsc::channel(1);
    let callable = Callable::from_local_fn("SignalFuture::resolve", move |_args| {
        let _ignore = tx.blocking_send(R::from_args(_args));
        Ok(Variant::nil())
    });
    async move {
        rx.recv().await
    }
}

I believe this might solve the problem with futures getting resolved or no, since the channels will get cleaned up even if the signal isn't fired. We won't have to worry about leaks as long as the executor/async runtime shuts down gracefully.
We won't need 2 different implementations i.e (SignalFuture and GuaranteedSignalFuture)

Since we are using existing channel implementations, we have lesser technical debt in godot_core as well.

  1. Lastly since to_future and async runtime are 2 seperate things, would it be possible to make 2 different PRs?
    The to_future API can come first since the implementation would be executor independent.

Please let me know what you'll think

@Dheatly23
Copy link
Contributor

Dheatly23 commented Feb 18, 2025

After a bit of testing, i found minor bug with to_guaranteed_future. I don't exactly know what happens, but it just hangs. Perhaps it's somehow got polled twice?

Code
use std::future::Future;
use std::panic::AssertUnwindSafe;
use std::pin::Pin;

use futures_util::future::{select_all, FutureExt as _};
use godot::classes::{Control, IControl};
use godot::prelude::*;

struct TestAsync;

#[gdextension]
unsafe impl ExtensionLibrary for TestAsync {}

#[derive(GodotClass)]
#[class(init, base = Control)]
struct NodeTestAsync {
    base: Base<Control>,
}

#[godot_api]
impl IControl for NodeTestAsync {
    fn ready(&mut self) {
        let this = self.to_gd();

        let signals = (0..this.get_child_count())
            .filter_map(|i| Some(Signal::from_object_signal(&this.get_child(i)?, "pressed")))
            .collect::<Vec<_>>();

        // Waits all child buttons and reports if they're being pressed.
        godot_task(AssertUnwindSafe(async move {
            fn wait_for_signal(
                i: usize,
                s: &Signal,
            ) -> Pin<Box<dyn '_ + Future<Output = Option<usize>>>> {
                // Without fuse, to_guaranteed_future hangs
                Box::pin(
                    async move {
                        println!("Wait: {i}");
                        s.to_guaranteed_future::<()>().await;
                        println!("Done: {i}");
                        Some(i)
                    }
                    .fuse(),
                )
            }

            let mut futs = signals
                .iter()
                .enumerate()
                .map(|(i, s)| wait_for_signal(i, s))
                .collect::<Vec<_>>();

            println!("Start");
            while !futs.is_empty() {
                let i;
                (i, _, futs) = select_all(futs).await;
                if let Some(i) = i {
                    println!("{i}");
                    futs.push(wait_for_signal(i, &signals[i]));
                }
            }
        }));
    }
}

You can't compare GDScript and Rust like this.

In GDScript has a script runtime that directly integrates with the GDScriptFunctionState.

In Rust, both the Future trait is completely opaque to the runtime and the Waker is completely opaque to the Future. We usually also don't just have a SignalFuture directly as the async task, but nested futures or even a tree of Futures:

Future >
  - Future > SignalFuture
  - Future > Future > SingalFuture

In the GDScript runtime, they store the pending function states inside the owning script and cancel them when the script is destroyed. The closest we can get to something like that, is to store TaskHandles inside GodotClasses and cancel the tasks on Drop. But this would still require some manual effort, similar to the Base field.

For ergonomic reason, we should reflect GDScript's convention as much as possible. Manually managing handles is unusual even for other async runtimes like Tokio.

We should be able to spawn a new task and forget about it, similiar to daemon thread. Shutdown sequence can be done like what Tokio did, using cancellation token to signal every outstanding tasks that we need to do cleanup. The runtime can then loops until all tasks finished.

I like the name, but the impression you get from the current name is what it does, it always resolves, but it might resolve to Option::None.

My idea is to return Err(NeverResolve) to indicate that the signal will never resolve. Being an error type also makes it easier to ? it.

@TitanNano
Copy link
Contributor Author

@coder137 to address your comment:

  1. Seems like we are creating a custom async runtime, could this be replaced by https://github.com/smol-rs/async-task?

We could, but that would be more overhead than using the engine. It would also require an additional dependency, while Godot already provides all the necessary components.

  1. Simplified version of to_future API it could be something like:
    I believe this might solve the problem with futures getting resolved or no, since the channels will get cleaned up even if the signal isn't fired

This does exactly the same thing as the GuaranteedSignalFuture but requires an external dependency.

We won't have to worry about leaks as long as the executor/async runtime shuts down gracefully.

And the same applies to the current state of this PR. The problem is with the SignalFuture that has the potential to get stuck indefinitely. If we get rid of it and only provide the GuaranteedSignalFuture then we won't have stuck futures anymore but will have to handle disappearing objects inside the future.

Lastly since to_future and async runtime are 2 seperate things, would it be possible to make 2 different PRs?
The to_future API can come first since the implementation would be executor-independent.

We can do that, but I would like to provide a way to execute futures without requiring users to include an external dependency. I also don't see what it would solve right now.


After a bit of testing, i found minor bug with to_guaranteed_future. I don't exactly know what happens, but it just hangs. Perhaps it's somehow got polled twice?

Thanks for the report, I will see what is going on there.

For ergonomic reason, we should reflect GDScript's convention as much as possible.

Yes, and I'm all for that, as long as it's technically possible.

Manually managing handles is unusual even for other async runtimes like Tokio.

The issue we are discussing has nothing to do with the runtime. If you use the SignalFuture on the tokio runtime, you end up with the same issue. The reason there is no problem in GDScript is that the script runtime and async runtime are the same thing, and they can counteract the oddities of their SingalFuture / GDScriptFunctionState from inside the script runtime.

Side note: since GDScript only cleans up pending function states when a GDScript gets destroyed, you can also end up with a large number of pending states, should your script simply never (or rarely) get destroyed.

My idea is to return Err(NeverResolve) to indicate that the signal will never resolve. Being an error type also makes it easier to ? it.

I see, yes that is an alternative that would be more descriptive.

@TitanNano
Copy link
Contributor Author

@Dheatly23 After some recent refactoring, the GuaranteedSignalFuture was holding the Mutex lock too long in its Drop implementation. After fixing that, your example works now. I also removed the UnwindSafe bound again, so you don't have to assert it in your user code.

async move {
    println!("Wait: {i}");
    s.to_guaranteed_future::<()>().await;
    println!("Done: {i}");
    Some(i)
}

This kind of code is quite reckless, as it treats "resolved because signal fired" and "resolved because signal object was freed" as one and the same thing.

@Dheatly23
Copy link
Contributor

This kind of code is quite reckless, as it treats "resolved because signal fired" and "resolved because signal object was freed" as one and the same thing.

Oh, i forgot to add ? to the await. In debugging the root cause, i might have substituted to_guaranteed_future with to_future, then forgot to fully revert it.


With regards to my original complaint, i think it should be resolved by making SignalFuture panics on signal never resolving. Then there is no need for manual cleanup since the task will be cleaned on next event loop (?) or engine shutdown.

I disagree on making TaskHandle a must_use though, since spawned async task should live independently. If user wants to access object, they can do something like this:

let this = self.to_gd();

godot_task(async move {
    // Do other tasks, wait for signal, etc.

    // Access object, if object does not exist it should panics.
    this.bind();
})

I quite like the @coder137 suggestion of decoupling signal future and async runtime. So user can essentially bring-your-own-async-runtime (Tokio, async-std, etc) with a default runtime if they so choose. For example: HTTP server with Axum/Tokio stack.

@TitanNano
Copy link
Contributor Author

With regards to my original complaint, i think it should be resolved by making SignalFuture panics on signal never resolving. Then there is no need for manual cleanup since the task will be cleaned on next event loop (?) or engine shutdown.

As I wrote in an earlier comment, I do agree with this and I think it could make sense to panic the future if the signal object is freed. The panic would be printed to stderr though (like all other panics), so while it shouldn't cause any problems, users probably would want to avoid having a lot of panicking tasks. But if you want to avoid the panics, you need to cancel the task before the signal object is freed. This could be hard to coordinate and people might prefer to have the signal future get stuck and then clean up whenever they are ready... What do you think?

I disagree on making TaskHandle a must_use though, since spawned async task should live independently. If user wants to access object, they can do something like this:

the #[must_use] has nothing to do with the class instance and would come down to this:

let this = self.to_gd();

let _ = godot_task(async move {
    // Do other tasks, wait for signal, etc.

    // Access object, if object does not exist it should panics.
    this.bind();
});

The intention was that it would highlight that the task handle can be of importance. And hopefully, people would make an informed decision on what they want to do with it. But making it #[must_use] might be too annoying.

I quite like the @coder137 suggestion of decoupling signal future and async runtime.

They are already decoupled. You should be able to use the futures with any runtime you like.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 3 times, most recently from 933d17b to ff96707 Compare February 18, 2025 22:21
@TitanNano
Copy link
Contributor Author

After all the discussions about the futures in this PR, I have now made the following refactoring:

  • the GuaranteedSignalFuture is now called TrySignalFuture
  • the TrySignalFuture resolves now to a Result<T, TrySignalFutureError> (the error type itself should still be refined a bit).
  • the SignalFuture is now a wrapper around the TrySignalFuture that turns the Result::Err into a panic.

@coder137
Copy link

coder137 commented Feb 19, 2025

the TrySignalFuture resolves now to a Result<T, TrySignalFutureError> (the error type itself should still be refined a bit).

To understand this better, under what conditions would we receive Err(TrySignalFutureError)?

In the PR I see it says if the Signal object is freed before the signal was emitted. Wouldn't the signal future also get freed in that case?

@TitanNano
Copy link
Contributor Author

@coder137 See the GuaranteedSignalFuture example in this comment #1043 (comment), but now it resolves to a Result::Err instead of Option::None.

@coder137
Copy link

coder137 commented Feb 20, 2025

Since SignalFuture can only get stuck in the event that the event is not fired.
What if we remove the SignalFuture and only keep the TrySignalFuture and give users a way to gracefully shutdown via CancellationToken (or something similar)? In the example it seems as though TrySignalFuture returns err once the game shuts down.

The reason I say this is because, user's might accidentally use SignalFuture's in their code without completely understanding its usecase or their own logic as the game changes.
In the event that the game hangs, they would need to go through some effort in order to debug the problem. If there are multiple SignalFuture's in the code, debugging the halts/freezes becomes harder.

@TitanNano
Copy link
Contributor Author

@coder137 SignalFuture can no longer get stuck. It will produce a panic. So users who run into these panics can choose to either switch from the TrySignalFuture or work around the cause for the panic.

In the example it seems as though TrySignalFuture returns err once the game shuts down.

No, it happens as soon as the signal object is freed. Of course, if you wait for a signal until the game shutdown, this could also happen during shutdown. But during shutdown, all pending tasks are also being canceled and dropped before the engine cleans up the scene tree, so it's unlikely to happen.

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 22, 2025

I am not completely sure yet. It doesn't happen if there are no open scenes or if none of them contains a node which spawns a Future.

It also doesn't seem to happen every single time if I close all scenes and then open one with a Future before triggering the hot-reload. In this case it panics with some scenes:

ERROR: godot-rust function call failed: <Callable>::GodotWaker::wake()
    Reason: [panic]  Future no longer exists when waking it! This is a bug!
  at /home/jrb0001/.cargo/git/checkouts/gdext-3ec94bd991a90eb6/2877010/godot-core/src/builtin/async_runtime.rs:271

With another scene it segfaults in this scenario.

Simply reopening the editor (same scene gets opened automatically) and then triggering a hot-reload segfaults for both scenes.

With both executor + Future from this PR, the hot-reload issue doesn't happen at all?!? So the issue could also be in my code, let me debug it properly before you waste more time on it.

I will do some more debugging later this week (probably weekend).

This is still the same behavior with the current commit, both the printed panic and the segfault.

(lldb) bt
* thread #1, name = 'godot.linuxbsd.', stop reason = signal SIGSEGV: invalid permissions for mapped object (fault address: 0x7213a9b64980)
  * frame #0: 0x00007213a9b64980
    frame #1: 0x000057bde516208f godot.linuxbsd.editor.dev.x86_64`CallableCustomExtension::call(this=0x000057be33a2d150, p_arguments=0x0000000000000000, p_argcount=0, r_return_value=0x00007ffdeb3d9050, r_call_error=0x00007ffdeb3d9044) const at gdextension_interface.cpp:170:12
    frame #2: 0x000057bde4ddfb88 godot.linuxbsd.editor.dev.x86_64`Callable::callp(this=0x00007213a008a410, p_arguments=0x0000000000000000, p_argcount=0, r_return_value=0x00007ffdeb3d9050, r_call_error=0x00007ffdeb3d9044) const at callable.cpp:57:15
    frame #3: 0x000057bde51860d2 godot.linuxbsd.editor.dev.x86_64`CallQueue::_call_function(this=0x000057be174450f0, p_callable=0x00007213a008a410, p_args=0x00007213a008a428, p_argcount=0, p_show_error=true) at message_queue.cpp:220:18
    frame #4: 0x000057bde51864d5 godot.linuxbsd.editor.dev.x86_64`CallQueue::flush(this=0x000057be174450f0) at message_queue.cpp:268:20
    frame #5: 0x000057bde2aa6c73 godot.linuxbsd.editor.dev.x86_64`SceneTree::physics_process(this=0x000057be19f32eb0, p_time=0.016666666666666666) at scene_tree.cpp:492:38
    frame #6: 0x000057bde034fac7 godot.linuxbsd.editor.dev.x86_64`Main::iteration() at main.cpp:4070:60
    frame #7: 0x000057bde02866da godot.linuxbsd.editor.dev.x86_64`OS_LinuxBSD::run(this=0x00007ffdeb3d92e0) at os_linuxbsd.cpp:962:22
    frame #8: 0x000057bde027e555 godot.linuxbsd.editor.dev.x86_64`main(argc=5, argv=0x00007ffdeb3d9928) at godot_linuxbsd.cpp:85:9
    frame #9: 0x00007213bc0dbe08 libc.so.6`___lldb_unnamed_symbol3261 + 120
    frame #10: 0x00007213bc0dbecc libc.so.6`__libc_start_main + 140
    frame #11: 0x000057bde027e2f5 godot.linuxbsd.editor.dev.x86_64`_start + 37

image list before the reload, sorted by address:

[ 68] DED83DF7-4521-915A-7BD6-E8990F4F802E-13B43F04 0x00007213a7db3000 /usr/lib/libasyncns.so.0 
[ 17] 7826DD70-4046-27B0-AEBE-4E42CA605707-322A0608 0x00007213a8e00000 /home/jrb0001/GodotProjects/vn-test/rust/target/x86_64-unknown-linux-gnu/debug/libgame.so 
[ 42] DE7A5D36-BC86-167E-D7DE-538A6F63CDC5-26014ACE 0x00007213a9e68000 /usr/lib/libwayland-client.so.0 
[ 41] 81E599CC-EF35-6DCD-56EE-E45E615603EA-2D605B22 0x00007213a9e77000 /usr/lib/libxcb-dri3.so.0 

So I guess there is a Callable still alive during the reload? Is there a way to figure out which one?


Another issue I am currently running into is that godot_task() polls the future immediately. Is this intentional? All other wake-ups go through a call_deferred(), only the first poll is different. It also didn't happen in an older iteration of this PR.

@TitanNano
Copy link
Contributor Author

So I guess there is a Callable still alive during the reload? Is there a way to figure out which one?

Gut feeling for the callable is that Godot is not reloading the custom callable and tries to access the invalid pointer after unloading the old library.

I can look into this more.


godot_task() polls the future immediately

Yes, this is intentional at the moment. There was the issue that signal futures inside an async block or function would only get created after the deferred initial poll from the godot_task and any signal that emitted right after calling godot_task would be ignored.

What exact issue is it causing?

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 22, 2025

I am calling godot_task() from IButton::pressed() and the future ends up calling Gd<MyButton>::set_focusmode() which triggers a notification and it can't create a second mut borrow to call IButton::on_notification. The first await comes somewhere after that. I need to defer somewhere, but I can't defer the call to godot_task() itself because I need to store the TaskHandle. So I guess awaiting a future which simply wakes itself is the only solution.

With my old executor, I also abused async blocks without any awaits as a generic "call deferred" solution. Simply because it was easier than messing around with Callable and its Send bound. We can ignore this usecase in my opinion.

@TitanNano
Copy link
Contributor Author

TitanNano commented Feb 23, 2025

I am calling godot_task() from IButton::pressed() and the future ends up calling Gd::set_focusmode() which triggers a notification and it can't create a second mut borrow to call IButton::on_notification. The first await comes somewhere after that. I need to defer somewhere, but I can't defer the call to godot_task() itself because I need to store the TaskHandle. So I guess awaiting a future which simply wakes itself is the only solution.

Can't you do something like this?

let base = self.base_mut();
let handle = godot_task(...); 
drop(base);
self.task_handles.push(handle);

Unfortunately, I wasn't able to replicate the hot-reload issue on macOS yet. It might be Linux-specific.

EDIT: can't replicate it in CI either.

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 24, 2025

Can't you do something like this?

let base = self.base_mut();
let handle = godot_task(...); 
drop(base);
self.task_handles.push(handle);

Yes, that worked. Thanks for the idea!

Unfortunately, I wasn't able to replicate the hot-reload issue on macOS yet. It might be Linux-specific.

EDIT: can't replicate it in CI either.

I think it is more likely something specific to my setup. I can't reproduce it with all my scenes, some just print panics without segfaulting. But for the affected scenes, it is 100% reproducible. I will try it in a clean project.

@jrb0001
Copy link
Contributor

jrb0001 commented Feb 25, 2025

Minimalistic repro project: executor-segfault.zip

cd rust
cargo build --target x86_64-unknown-linux-gnu
$GODOT4_BIN --editor --path ../project/ &
# Wait until project is fully loaded.
touch game/src/lib.rs
cargo build --target x86_64-unknown-linux-gnu
# Focus editor window --> segfault.

If you are not on linux, then just change the x86_64-unknown-linux-gnu to your target. The .gdextension file is configured for anything supported by the official export templates.

So it looks like this is caused by calling the Waker inside drop() of a (tool) Node.

@TitanNano TitanNano force-pushed the jovan/async_rt branch 2 times, most recently from e4d5aca to 1f9069b Compare February 25, 2025 22:25
@TitanNano
Copy link
Contributor Author

@jrb0001 it looks like your issue comes down to calling call_deferred on a callable during GDExtension deinitialization (The rust struct of the Node is dropped during deinitialization of the extension when hot-reloading it). Replicating your setup causes a panic for me on macOS:

unsafe precondition(s) violated: hint::unreachable_unchecked must never be reached
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

But on linux in CI it works perfectly fine (perhaps the timing is slightly different?). I think you can create a new issue for this edge-case of deferred calling callables during deinitialization.

@TitanNano TitanNano changed the title [WIP] Async Signals Async Signals Feb 26, 2025
@TitanNano TitanNano marked this pull request as ready for review February 26, 2025 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adds functionality to the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async/Await for Signals
7 participants