-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01_intro.qmd
101 lines (78 loc) · 2.97 KB
/
01_intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "Intro"
engine: knitr
---
## Context
Large scale numerical experiments are central to much of contemporary scientific and mathematical research. Performing these numerical experiments in a valid, reproducible and scalable fashion is not easy.
Even a small typical project may need to run 1000s of executions, and 10k+ is not uncommon.
It is crucial to have good tools to coordinate and organize these experiments.
## `nf-nest`
This webpage documents `nf-nest`, a collection of small but powerful utilities built on nextflow to
help accomplish this. [Link to github repository.](https://github.com/UBC-Stat-ML/nf-nest).
Aspects taken into account:
- Automating cross-product of input parameters in experiments.
- Automating creation of submission scripts and ordering of jobs (taking care of moving across file system, dynamic memory requirements, etc).
- Automating the gathering of results from many runs.
- Caching already ran jobs.
- Robustness to failure.
- Reproducibility via apptainer and docker, supporting both x86 and apple silicon.
- Support for GPU programming.
## Technology stack
`nf-nest` uses the following open source projects:
- Nextflow: can be thought of as an "operating system" for coordinating numerical experiments.
- Julia: a programming language to unlock full access to high performance computation on both CPUs and GPUs.
While some features of `nf-nest` are Julia specific, other parts are language agnostic.
## Background
### Scientific workflow
A scientific workflow is a directed acyclic graph where each node is a process and each edge
between node $n$ to $n'$
denote that at least one output of process $n$ is fed as an input to process $n'$.
Here is an example from [a workflow covered later in this tutorial](07_combine.qmd):
```{mermaid}
flowchart TB
subgraph " "
v0["julia_env"]
v1["toml"]
v4["Channel.fromList"]
v7["julia_env"]
v8["toml"]
v12["plot_script"]
end
v2([instantiate_process])
v3([precompile])
v5([run_julia])
subgraph combine_workflow
v9([instantiate_process])
v10([precompile])
v11([combine_process])
end
v13([plot])
subgraph " "
v14["combined_csvs_folder"]
v15[" "]
end
v6(( ))
v0 --> v2
v1 --> v2
v2 --> v3
v3 --> v5
v3 --> v13
v4 --> v5
v5 --> v6
v7 --> v9
v8 --> v9
v9 --> v10
v10 --> v11
v6 --> v11
v11 --> v13
v12 --> v13
v13 --> v15
v13 --> v14
```
In a nutshell, nextflow will submit one or several SLURM job for
each node in this graph, gather results, and produce some [nice](report.html) [reports](timeline.html).
## More information
Both nextflow and Julia have excellent and extensive documentation.
- [Nexflow documentation](https://www.nextflow.io/docs/latest/overview.html)
- [Julia documentation](https://docs.julialang.org/en/v1/)
See also [this nextflow tutorial](https://carpentries-incubator.github.io/Pipeline_Training_with_Nextflow/01-Workflows_Pipelines/index.html).