|
| 1 | +# Part 3: Resource profiling and optimization |
| 2 | + |
| 3 | +THIS IS A PLACEHOLDER |
| 4 | + |
| 5 | +!!!note |
| 6 | + |
| 7 | + This training module is under redevelopment. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +TODO |
| 12 | + |
| 13 | +### 4.3. Run the workflow to generate a resource utilization report |
| 14 | + |
| 15 | +To have Nextflow generate the report automatically, simply add `-with-report <filename>.html` to your command line. |
| 16 | + |
| 17 | +```bash |
| 18 | +nextflow run main.nf -profile my_laptop -with-report report-config-1.html |
| 19 | +``` |
| 20 | + |
| 21 | +The report is an html file, which you can download and open in your browser. You can also right click it in the file explorer on the left and click on `Show preview` in order to view it in VS Code. |
| 22 | + |
| 23 | +Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources. |
| 24 | +Make sure to click on the tabs that show the utilization results as a percentage of what was allocated. |
| 25 | +There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features. |
| 26 | + |
| 27 | +<!-- TODO: insert images --> |
| 28 | + |
| 29 | +One observation is that the `GATK_JOINTGENOTYPING` seems to be very hungry for CPU, which makes sense since it performs a lot of complex calculations. |
| 30 | +So we could try boosting that and see if it cuts down on runtime. |
| 31 | + |
| 32 | +However, we seem to have overshot the mark with the memory allocations; all processes are only using a fraction of what we're giving them. |
| 33 | +We should dial that back down and save some resources. |
| 34 | + |
| 35 | +### 4.4. Adjust resource allocations for a specific process |
| 36 | + |
| 37 | +We can specify resource allocations for a given process using the `withName` process selector. |
| 38 | +The syntax looks like this when it's by itself in a process block: |
| 39 | + |
| 40 | +```groovy title="Syntax" |
| 41 | +process { |
| 42 | + withName: 'GATK_JOINTGENOTYPING' { |
| 43 | + cpus = 4 |
| 44 | + } |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +Let's add that to the existing process block in the `nextflow.config` file. |
| 49 | + |
| 50 | +```groovy title="nextflow.config" linenums="11" |
| 51 | +process { |
| 52 | + // defaults for all processes |
| 53 | + cpus = 2 |
| 54 | + memory = 2.GB |
| 55 | + // allocations for a specific process |
| 56 | + withName: 'GATK_JOINTGENOTYPING' { |
| 57 | + cpus = 4 |
| 58 | + } |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +With that specified, the default settings will apply to all processes **except** the `GATK_JOINTGENOTYPING` process, which is a special snowflake that gets a lot more CPU. |
| 63 | +Hopefully that should have an effect. |
| 64 | + |
| 65 | +### 4.5. Run again with the modified configuration |
| 66 | + |
| 67 | +Let's run the workflow again with the modified configuration and with the reporting flag turned on, but notice we're giving the report a different name so we can differentiate them. |
| 68 | + |
| 69 | +```bash |
| 70 | +nextflow run main.nf -profile my_laptop -with-report report-config-2.html |
| 71 | +``` |
| 72 | + |
| 73 | +Once again, you probably won't notice a substantial difference in runtime, because this is such a small workload and the tools spend more time in ancillary tasks than in performing the 'real' work. |
| 74 | + |
| 75 | +However, the second report shows that our resource utilization is more balanced now. |
| 76 | + |
| 77 | +<!-- **TODO: screenshots?** --> |
| 78 | + |
| 79 | +As you can see, this approach is useful when your processes have different resource requirements. It empowers you to right-size the resource allocations you set up for each process based on actual data, not guesswork. |
| 80 | + |
| 81 | +!!!note |
| 82 | + |
| 83 | + This is just a tiny taster of what you can do to optimize your use of resources. |
| 84 | + Nextflow itself has some really neat [dynamic retry logic](https://training.nextflow.io/basic_training/debugging/#dynamic-resources-allocation) built in to retry jobs that fail due to resource limitations. |
| 85 | + Additionally, the Seqera Platform offers AI-driven tooling for optimizing your resource allocations automatically as well. |
| 86 | + |
| 87 | + We'll cover both of those approaches in an upcoming part of this training course. |
| 88 | + |
| 89 | +That being said, there may be some constraints on what you can (or must) allocate depending on what computing executor and compute infrastructure you're using. For example, your cluster may require you to stay within certain limits that don't apply when you're running elsewhere. |
0 commit comments