Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] Add documentation for hypothetical pre-timeout scripts #2018

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
5 changes: 4 additions & 1 deletion site/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,10 @@ nav:
- "Mutual exclusion and rate-limiting": docs/configuration/test-groups.md
- "Environment variables": docs/configuration/env-vars.md
- "Extra arguments": docs/configuration/extra-args.md
- docs/configuration/setup-scripts.md
- "Scripts":
- "Overview": docs/scripts/index.md
- docs/scripts/setup.md
- docs/scripts/pre-timeout.md
- Machine-readable output:
- "About output formats": docs/machine-readable/index.md
- "JUnit support": docs/machine-readable/junit.md
Expand Down
125 changes: 125 additions & 0 deletions site/src/docs/scripts/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
icon: material/script-text
---

# Scripts

Nextest supports running _scripts_ when certain events occur during a test run. Scripts can be scoped to particular tests via [filtersets](../filtersets/index.md).

Nextest currently recognizes two types of scripts:

* [_Setup scripts_](setup.md), which execute at the start of a test run.
* [_Pre-timeout scripts_](pre-timeout.md), which execute before nextest terminates a test that has exceeded its timeout.

Scripts are configured in two parts: _defining scripts_, and _specifying rules_ for when they should be executed.

## Defining scripts

Scripts are defined using the top-level `script.<type>` configuration.

For example, to define a setup script named "my-script", which runs `my-script.sh`:

```toml title="Setup script definition in <code>.config/nextest.toml</code>"
[script.setup.my-script]
command = 'my-script.sh'
# Additional options...
```

See [_Defining setup scripts_](setup.md#defining-setup-scripts) for the additional options available for configuring setup scripts.

To instead define a pre-timeout script named "my-script", which runs `my-script.sh`:

```toml title="Pre-timeout script definition in <code>.config/nextest.toml</code>"
[script.pre-timeout.my-script]
command = 'my-script.sh'
# Additional options...
```

See [_Defining pre-timeout scripts_](pre-timeout.md#defining-pre-timeout-scripts) for the additional options available for configuring pre-timeout scripts.

### Command specification

All script types support the `command` option, which specifies how to invoke the script. Commands can either be specified using Unix shell rules, or as a list of arguments. In the following example, `script1` and `script2` are equivalent.

```toml
[script.<type>.script1]
command = 'script.sh -c "Hello, world!"'

[script.<type>.script2]
command = ['script.sh', '-c', 'Hello, world!']
```

### Timeouts

All script types support the following timeout options:

- **`slow-timeout`**: Mark a script [as slow](../features/slow-tests.md) or [terminate it](../features/slow-tests.md#terminating-tests-after-a-timeout), using the same configuration as for tests. By default, scripts are not marked as slow or terminated (this is different from the slow timeout for tests).
- **`leak-timeout`**: Mark scripts [leaky](../features/leaky-tests.md) after a timeout, using the same configuration as for tests. By default, the leak timeout is 100ms.


```toml title="Script definition with timeouts"
[script.<type>.my-script]
command = 'script.sh'
slow-timeout = { period = "60s", terminate-after = 2 }
leak-timeout = "1s"
```

### Namespacing

Script names must be unique across all script types.

This means that you cannot use the same name for a setup script and a pre-timeout script:

```toml title="Pre-timeout script definition in <code>.config/nextest.toml</code>"
[script.setup.my-script]
command = 'setup.sh'

# Reusing the `my-script` name for a pre-timeout script is NOT permitted.
[script.pre-timeout.my-script]
command = 'pre-timeout.sh'
```

## Specifying rules

In configuration, you can create rules for when to use scripts on a per-profile basis. This is done via the `profile.<profile-name>.scripts` array. For example, you can configure a setup script that generates a database if tests from the `db-tests` package, or any packages that depend on it, are run.

```toml title="Basic rules"
[[profile.default.scripts]]
filter = 'rdeps(db-tests)'
setup = 'db-generate'
```

(This example uses the `rdeps` [filterset](../filtersets/index.md) predicate.)

Scripts can also filter based on platform, using the rules listed in [_Specifying platforms_](../configuration/specifying-platforms.md):

```toml title="Platform-specific rules"
[[profile.default.scripts]]
platform = { host = "cfg(unix)" }
setup = 'script1'
```

A set of scripts can also be specified. All scripts in the set will be executed.

```toml title="Multiple setup scripts"
[[profile.default.scripts]]
filter = 'test(/^script_tests::/)'
setup = ['script1', 'script2']
```

Executing pre-timeout scripts follows the same pattern. For example, you can configure a pre-timeout script for every test that contains `slow` in its name.

```toml title="Basic pre-timeout rules"
[[profile.default.scripts]]
filter = 'test(slow)'
pre-timeout = 'capture-backtrace'
```

A single rule can specify any number of setup scripts and any number of pre-timeout scripts.

```toml title="Combination rules"
[[profile.default.scripts]]
filter = 'test(slow)'
setup = ['setup-1', 'setup-2']
pre-timeout = ['pre-timeout-1', 'pre-timeout-2']
```
69 changes: 69 additions & 0 deletions site/src/docs/scripts/pre-timeout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
icon: material/timer-sand-empty
status: experimental
---

<!-- md:version 0.9.59 -->

# Pre-timeout scripts

!!! experimental "Experimental: This feature is not yet stable"

- **Enable with:** Add `experimental = ["pre-timeout-scripts"]` to `.config/nextest.toml`
- **Tracking issue:** [#TODO](https://github.com/nextest-rs/nextest/issues/TODO)


Nextest runs *pre-timeout scripts* before terminating a test that has exceeded
its timeout.

Pre-timeout scripts are useful for automatically collecting backtraces, logs, etc. that can assist in debugging why a test is slow or hung.

## Defining pre-timeout scripts

Pre-timeout scripts are defined using the top-level `script.pre-timeout` configuration. For example, to define a script named "my-script", which runs `my-script.sh`:

```toml title="Script definition in <code>.config/nextest.toml</code>"
[script.pre-timeout.my-script]
command = 'my-script.sh'
```

See [_Defining scripts_](index.md#defining-scripts) for options that are common to all scripts.

Pre-timeout scripts do not support additional configuration options.

Notably, pre-timeout scripts always capture stdout and stderr. Support for not capturing stdout and stderr may be added in the future in order to support use cases like interactive debugging of a hung test.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to address this:

How should information about the script being run be communicated to the user?

The states presented to the user blow up a bit here.

One of the things that's worth thinking about, I think, is passing through stdout and stderr -- unlike setup scripts, pre-timeout scripts aren't executed serially, so capturing stdout/stderr generally makes a lot of sense.

But! There's also a compelling use case for pre-timeout scripts: to put you into an interactive debugger, effectively acting as a break point. In that case, you do want to pass through stdout and stderr, and you want to keep processing existing tests in the background, but (I think) not start new ones. Even if we don't solve it here, it's worth keeping that case in mind.

I was thinking to keep things simple for the initial implementation and force that stdout/stderr are captured. @sunshowers does that make sense to you?

How should information about the script being run be communicated to the user?

Flagging that I haven't forgotten about this! I think I'll need to get a little further along in the implementation before I can answer this question, though. Happy to take any advice you have in the meantime, though, @sunshowers, if there's a particular UX you're imagining!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to keep things simple for the initial implementation and force that stdout/stderr are captured. @sunshowers does that make sense to you?

Yeah, this is ok, keep it simple for now.

Flagging that I haven't forgotten about this! I think I'll need to get a little further along in the implementation before I can answer this question, though. Happy to take any advice you have in the meantime, though, @sunshowers, if there's a particular UX you're imagining!

So there's several points to surface user information here.

At the moment the test has timed out:

Once the script is done executing:

We should print the test's output, as well as the script's output, with the two being clearly distinguished from each other.

In information queries: (i.e. t, or ctrl-t/SIGINFO where available):

We should say something like "terminating due to timeout" -> "running pre-timeout script foo", and carry around all the details for foo just like we would for any other unit.

To do this we'd need to expand the terminating state to carry this information, I think:

/// The current terminating state of a test or script process.
///
/// Part of [`UnitState::Terminating`].
#[derive(Clone, Debug)]
pub struct UnitTerminatingState {
/// The process ID.
pub pid: u32,
/// The amount of time the unit ran for.
pub time_taken: Duration,
/// The reason for the termination.
pub reason: UnitTerminateReason,
/// The method by which the process is being terminated.
pub method: UnitTerminateMethod,
/// How long has been spent waiting for the process to exit.
pub waiting_duration: Duration,
/// How much longer nextest will wait until a kill command is sent to the process.
pub remaining: Duration,
}

Maybe in an Option<Box<T>> field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I ended up with a slightly different approach than you proposed. Instead of adding an optional pre-timeout field to the Terminating state, I added a new top-level UnitState::PreTimeout state, which itself contains a script_state: UnitState field. This allows the pre-timeout script itself to transition between the standard unit states (running, slow, terminating, exited) with minimal code duplication.

Here's how the output looks in this initial implementation.

Basic statuses:

$ ~/Sites/nextest/target/debug/cargo-nextest nextest run --no-fail-fast
info: experimental features enabled: setup-scripts, pre-timeout-scripts
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.00s
────────────
 Nextest run ID d0f8b05a-5a82-4fa7-9f3c-5cccd3cbccbb with nextest profile: default
    Starting 2 tests across 1 binary
             [ 00:00:00]                                                                                                                                                                                                                   0/2:             SETUP [      1/1] my-script: ./setup.sh
setting up!
  SETUP PASS [   0.015s] my-script: ./setup.sh
        SLOW [>  1.000s] nextest-play tests::it_no_works
        SLOW [>  1.000s] nextest-play tests::it_works
 TERMINATING [>  2.000s] nextest-play tests::it_no_works
      PRETMT             nextest-play tests::it_works
     TIMEOUT [   2.007s] nextest-play tests::it_no_works
──── STDOUT:             nextest-play tests::it_no_works

running 1 test

 PRETMT SLOW [>  1.000s] nextest-play tests::it_works
 PRETMT PASS [   1.233s] nextest-play tests::it_works
──── STDOUT:             gdb-dump: ./pre-timeout.sh
pre-timeout sample stdout
──── STDERR:             gdb-dump: ./pre-timeout.sh
pre-timeout sample stderr

 TERMINATING [>  2.000s] nextest-play tests::it_works
     TIMEOUT [   3.240s] nextest-play tests::it_works
──── STDOUT:             nextest-play tests::it_works

running 1 test
test sample stdout
──── STDERR:             nextest-play tests::it_works
test sample stderr

────────────
     Summary [   3.257s] 2 tests run: 0 passed, 2 timed out, 0 skipped
     TIMEOUT [   2.007s] nextest-play tests::it_no_works
     TIMEOUT [   3.240s] nextest-play tests::it_works
error: test run failed

A test with an activated pre-timeout script transitions like so:

  • Running (not printed)
  • Slow (printed as SLOW)
  • Pre-timeout script starting (printed as PRETMT)
  • Pre-timeout script itself slow (printed as PRETMT SLOW)
  • Pre-timeout script pass/fail (printed as PRETMT PASS/PRETMT FAIL, followed by the pre-timeout script's stdout/stderr)
  • Terminating (printed as TERMINATING)
  • Exit (printed as TIMEOUT, followed by the test's stdout/stderr)

If you request status while the pre-timeout script is executing, you see something like this:

* 1/1:    nextest-play tests::it_works
  status: test running for 2.493s as PID 41610 (marked slow after 1.000s)
  note:   test has reached timeout, pre-timeout script is running:
      status: script running for 0.485s as PID 41641
      stdout:
        pre-timeout sample stdout
      stderr:
        pre-timeout sample stderr

I'm by no means married to any of this. It's just a result of my initial attempt to balance "clear to end users" and "straightforward to implement."


### Example

To invoke GDB to dump backtraces before a hanging test is terminated:

```toml title="Advanced pre-timeout script definition"
[script.pre-timeout.gdb-dump]
command = ['sh', '-c', 'gdb -p $NEXTEST_PRE_TIMEOUT_TEST_PID -batch -ex "thread apply all backtrace"']
# TODO options
```

## Specifying pre-timeout script rules

See [_Specifying rules_](index.md#specifying-rules).

## Pre-timeout script execution

A given pre-timeout script _S_ is executed when the current profile has at least one rule where the `platform` predicates match the current execution environment, the script _S_ is listed in `pre-timeout`, and a test matching the `filter` has reached its configured timeout.

Pre-timeout scripts are executed serially, in the order they are defined (_not_ the order they're specified in the rules). If any pre-timeout script exits with a non-zero exit code, an error is logged but the test run continues.

Nextest will proceed with graceful termination of the test only once the pre-timeout script terminates. See [_How nextest terminates tests_](#defining-pre-timeout-scripts). If the pre-timeout script itself is slow, nextest will apply the same termination protocol to the pre-timeout script.

The pre-timeout script is not responsible for terminating the test process, but it is permissible for it to do so.
Comment on lines +58 to +60
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beginning to address this question:

Should the script be responsible for killing the test, or should nextest do so (after waiting for the script to exit)? I'd lean towards the latter.

Covering this would be useful -- I'm still leaning towards killing the process group anyway, but saying it explicitly would be nice.

and this one:

Do we still want to allow a grace period (SIGTERM before SIGKILL) in this case?

This is worth mentioning as well. Pre-timeout scripts add a bunch of states to the state machine, because each a unit of work now has a sidecar process involved as well. One approach is to add the pre-timeout script to the process group (Unix) / job object (Windows) -- that would provide fairly clear semantics, I think. The script and the test then live and die together -- the code that waits for the child process to exit can now wait on both processes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may need to get further into the code to really understand the complexity here, but it's not immediately obvious to me why we need to couple the script and test lifetimes together or put the processes in the same pgrp. Naively it seems straightforward to just apply the graceful termination logic to the pre-timeout script and then move on to apply the graceful termination logic to the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely possible to do that, but my concern is just that it blows up the state machine even further. The state machine for an individual test already has on the order of 40-50 states today once you consider all the combinations of select branches that are or aren't done at any moment.

Adding a "start script -> read stdout/stderr from script -> wait on script + test to exit -> check for the script leaking handles" already blows it up quite a bit, though the benefits outweigh the complexity.

Adding "SIGTERM to script -> maybe SIGKILL -> script done -> SIGTERM to test -> maybe SIGKILL -> test done" adds even more states to the state machine, and I'm not sure this pays its way compared to killing the script and the test at the same time.

But, I agree that this decision can be made based on how complex the implementation gets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't terribly difficult to add a run_pre_timeout_script that handles slow timeouts/graceful termination/leak checking just like running tests and running setup scripts. Two open questions though:


Nextest executes pre-timeout scripts with the same working directory as the test and sets the following variables in the script's environment:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clause here specifying the CWD:

What should the cwd of the script process be?

This should just be the same as the cwd of the test, I'm quite sure.


* **`NEXTEST_PRE_TIMEOUT_TEST_PID`**: the ID of the process running the test.
* **`NEXTEST_PRE_TIMEOUT_TEST_NAME`**: the name of the running test.
* **`NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID`**: the ID of the binary in which the test is located.
* **`NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_PACKAGE_NAME`**: the package name component of the binary ID.
* **`NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_NAME`**: the name component of the binary ID, if known.
* **`NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_KIND`**: the kind component of the binary ID, if known.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunshowers would love your input on whether these are the right names for the _BINARY_ID_* environment variables.

Copy link
Member

@sunshowers sunshowers Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd match the filterset DSL. I'd change:

  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_PACKAGE_NAME -> NEXTEST_PRE_TIMEOUT_TEST_PACKAGE
  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_NAME -> NEXTEST_PRE_TIMEOUT_TEST_BINARY
  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_KIND -> NEXTEST_PRE_TIMEOUT_TEST_BINARY_KIND. I think BINARY_KIND here rather than KIND because the latter is a bit confusing, though I'm not certain of this. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd match the filterset DSL. I'd change:

  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_PACKAGE_NAME -> NEXTEST_PRE_TIMEOUT_TEST_PACKAGE
  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_NAME -> NEXTEST_PRE_TIMEOUT_TEST_BINARY
  • NEXTEST_PRE_TIMEOUT_TEST_BINARY_ID_KIND -> NEXTEST_PRE_TIMEOUT_TEST_BINARY_KIND. I think BINARY_KIND here rather than KIND because the latter is a bit confusing, though I'm not certain of this. What do you think?

Makes sense to me!


<!-- TODO: a protocol for writing script logs to a file and telling nextest to attach them to JUnit reports? -->
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nextest need to create a directory for the script to write logs and such out to? Even if not necessary, is it a good to have?

I really would like this, personally. That information can then be attached in all sorts of ways, e.g. to JUnit reports.

I think it would also be useful to have a protocol for the pre-timeout script to give information back to nextest -- for example, the list of files written out and/or to include. This can just be a file the script writes to, similar to NEXTEST_ENV for setup scripts.

The exact details for much of this can be outlined in an architecture doc (this is more usage-oriented), but a quick summary of a lot of this would be useful.

@sunshowers are you comfortable deferring this to future work? I'd like to try to keep the initial implementation as focused as possible, and this bit seems a hunk of work that's easy to split off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fine to defer to the future. I just want to make sure we don't box ourselves into making this impossible in the future.

Original file line number Diff line number Diff line change
Expand Up @@ -12,77 +12,38 @@ status: experimental
- **Enable with:** Add `experimental = ["setup-scripts"]` to `.config/nextest.toml`
- **Tracking issue:** [#978](https://github.com/nextest-rs/nextest/issues/978)

Nextest supports running _setup scripts_ before tests are run. Setup scripts can be scoped to
particular tests via [filtersets](../filtersets/index.md).
Nextest runs *setup scripts* before tests are run.

Setup scripts are configured in two parts: _defining scripts_, and _setting up rules_ for when they should be executed.
## Defining setup scripts

## Defining scripts
Setup scripts are defined using the top-level `script.setup` configuration. For example, to define a script named "my-script", which runs `my-script.sh`:

Setup scripts are defined using the top-level `script` configuration. For example, to define a script named "my-script", which runs `my-script.sh`:

```toml title="Setup script definition in <code>.config/nextest.toml</code>"
[script.my-script]
```toml title="Script definition in <code>.config/nextest.toml</code>"
[script.setup.my-script]
command = 'my-script.sh'
```

Commands can either be specified using Unix shell rules, or as a list of arguments. In the following example, `script1` and `script2` are equivalent.
See [_Defining scripts_](index.md#defining-scripts) for options that are common to all scripts.

```toml
[script.script1]
command = 'script.sh -c "Hello, world!"'

[script.script2]
command = ['script.sh', '-c', 'Hello, world!']
```
Setup scripts support the following additional configuration options:

Setup scripts can have the following configuration options attached to them:

- **`slow-timeout`**: Mark a setup script [as slow](../features/slow-tests.md) or [terminate it](../features/slow-tests.md#terminating-tests-after-a-timeout), using the same configuration as for tests. By default, setup scripts are not marked as slow or terminated (this is different from the slow timeout for tests).
- **`leak-timeout`**: Mark setup scripts [leaky](../features/leaky-tests.md) after a timeout, using the same configuration as for tests. By default, the leak timeout is 100ms.
- **`capture-stdout`**: `true` if the script's standard output should be captured, `false` if not. By default, this is `false`.
- **`capture-stderr`**: `true` if the script's standard error should be captured, `false` if not. By default, this is `false`.

### Example

```toml title="Advanced setup script definition"
[script.db-generate]
[script.setup.db-generate]
command = 'cargo run -p db-generate'
slow-timeout = { period = "60s", terminate-after = 2 }
leak-timeout = "1s"
capture-stdout = true
capture-stderr = false
```

## Setting up rules

In configuration, you can create rules for when to use scripts on a per-profile basis. This is done via the `profile.<profile-name>.scripts` array. For example, you can set up a script that generates a database if tests from the `db-tests` package, or any packages that depend on it, are run.

```toml title="Basic rules"
[[profile.default.scripts]]
filter = 'rdeps(db-tests)'
setup = 'db-generate'
```

(This example uses the `rdeps` [filterset](../filtersets/index.md) predicate.)

Setup scripts can also filter based on platform, using the rules listed in [_Specifying platforms_](../configuration/specifying-platforms.md):
## Specifying setup script rules

```toml title="Platform-specific rules"
[[profile.default.scripts]]
platform = { host = "cfg(unix)" }
setup = 'script1'
```

A set of scripts can also be specified. All scripts in the set will be executed.

```toml title="Multiple setup scripts"
[[profile.default.scripts]]
filter = 'test(/^script_tests::/)'
setup = ['script1', 'script2']
```
See [_Specifying rules_](index.md#specifying-rules).

## Script execution
## Setup script execution

A given setup script _S_ is only executed if the current profile has at least one rule where the `filter` and `platform` predicates match the current execution environment, and the setup script _S_ is listed in `setup`.

Expand Down
Loading