Skip to content

Commit 9e4d9c0

Browse files
committed
Fix capitalization in genomics docs
+ renumber the placeholder for the config section to avoid confusion
1 parent 03b187d commit 9e4d9c0

File tree

4 files changed

+92
-3
lines changed

4 files changed

+92
-3
lines changed

docs/nf4_science/genomics/03_configuration.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Part 3: Resource profiling and optimization
1+
# Part 5: Resource profiling and optimization
22

33
THIS IS A PLACEHOLDER
44

docs/nf4_science/genomics/03_modules.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Part 3: moving code into modules
1+
# Part 3: Moving code into modules
22

33
In the first part of this course, you built a variant calling pipeline that was completely linear and processed each sample's data independently of the others.
44

docs/nf4_science/genomics/04_testing.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Part 4: adding tests
1+
# Part 4: Adding tests
22

33
In the first part of this course, you built a variant calling pipeline that was completely linear and processed each sample's data independently of the others.
44

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Part 3: Resource profiling and optimization
2+
3+
THIS IS A PLACEHOLDER
4+
5+
!!!note
6+
7+
This training module is under redevelopment.
8+
9+
---
10+
11+
TODO
12+
13+
### 4.3. Run the workflow to generate a resource utilization report
14+
15+
To have Nextflow generate the report automatically, simply add `-with-report <filename>.html` to your command line.
16+
17+
```bash
18+
nextflow run main.nf -profile my_laptop -with-report report-config-1.html
19+
```
20+
21+
The report is an html file, which you can download and open in your browser. You can also right click it in the file explorer on the left and click on `Show preview` in order to view it in VS Code.
22+
23+
Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources.
24+
Make sure to click on the tabs that show the utilization results as a percentage of what was allocated.
25+
There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features.
26+
27+
<!-- TODO: insert images -->
28+
29+
One observation is that the `GATK_JOINTGENOTYPING` seems to be very hungry for CPU, which makes sense since it performs a lot of complex calculations.
30+
So we could try boosting that and see if it cuts down on runtime.
31+
32+
However, we seem to have overshot the mark with the memory allocations; all processes are only using a fraction of what we're giving them.
33+
We should dial that back down and save some resources.
34+
35+
### 4.4. Adjust resource allocations for a specific process
36+
37+
We can specify resource allocations for a given process using the `withName` process selector.
38+
The syntax looks like this when it's by itself in a process block:
39+
40+
```groovy title="Syntax"
41+
process {
42+
withName: 'GATK_JOINTGENOTYPING' {
43+
cpus = 4
44+
}
45+
}
46+
```
47+
48+
Let's add that to the existing process block in the `nextflow.config` file.
49+
50+
```groovy title="nextflow.config" linenums="11"
51+
process {
52+
// defaults for all processes
53+
cpus = 2
54+
memory = 2.GB
55+
// allocations for a specific process
56+
withName: 'GATK_JOINTGENOTYPING' {
57+
cpus = 4
58+
}
59+
}
60+
```
61+
62+
With that specified, the default settings will apply to all processes **except** the `GATK_JOINTGENOTYPING` process, which is a special snowflake that gets a lot more CPU.
63+
Hopefully that should have an effect.
64+
65+
### 4.5. Run again with the modified configuration
66+
67+
Let's run the workflow again with the modified configuration and with the reporting flag turned on, but notice we're giving the report a different name so we can differentiate them.
68+
69+
```bash
70+
nextflow run main.nf -profile my_laptop -with-report report-config-2.html
71+
```
72+
73+
Once again, you probably won't notice a substantial difference in runtime, because this is such a small workload and the tools spend more time in ancillary tasks than in performing the 'real' work.
74+
75+
However, the second report shows that our resource utilization is more balanced now.
76+
77+
<!-- **TODO: screenshots?** -->
78+
79+
As you can see, this approach is useful when your processes have different resource requirements. It empowers you to right-size the resource allocations you set up for each process based on actual data, not guesswork.
80+
81+
!!!note
82+
83+
This is just a tiny taster of what you can do to optimize your use of resources.
84+
Nextflow itself has some really neat [dynamic retry logic](https://training.nextflow.io/basic_training/debugging/#dynamic-resources-allocation) built in to retry jobs that fail due to resource limitations.
85+
Additionally, the Seqera Platform offers AI-driven tooling for optimizing your resource allocations automatically as well.
86+
87+
We'll cover both of those approaches in an upcoming part of this training course.
88+
89+
That being said, there may be some constraints on what you can (or must) allocate depending on what computing executor and compute infrastructure you're using. For example, your cluster may require you to stay within certain limits that don't apply when you're running elsewhere.

0 commit comments

Comments
 (0)