-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified single cell tutorial and initiated bulk tutorial #368
base: tutorials
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.0.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
|
||
|
||
## Running airrflow pipeline from two different input formats | ||
There are two acceptable input formats for airrflow single-cell AIRRseq pipeline: AIRR rearrangement or fastq format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 it would be great to show here what is the output message that appears on the console once the tests pass successfully
- If the automatic threshold is unsatisfactory, you can set the threshold manually and re-run the pipeline. | ||
(Tip: use -resume whenever running the Nextflow pipeline to avoid duplicating previous work). | ||
- For TCR data, where somatic hypermutation does not occur, set the clonal_threshold to 0 when running the Airrflow pipeline. | ||
- Once the threshold is established, clones are assigned to the sequences. A variety of tables and plots associated with clonal analysis were added to the folder 'clonal_analysis/define_clones', such as sequences_per_locus_table, sequences_per_c_call_table, sequences_per_constant_region_table,num_clones_table, clone_sizes_table,clone size distribution plot, clonal abundance plot, diversity plot and etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more information on the clonal analysis in a dedicated section now, could you rewrite the section here to just point out to the respective find_threshold
report, clonal_analysis
report and lineage_threshold
reports?
|
||
6. Other reporting. | ||
- Additional reports are also generated, including: a multiqc report which summarizes QC metrics across all samples, pipeline_info reports and report_file_size reports. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rewrite this part a bit focusing on the different html reports and a general description on what is inside them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 I've reviewed and edited the single-cell tutorial for now and added also some comments to this review on how it can still be improved. Let me know if you have any questions!
nextflow run nf-core/airrflow -r 4.2.0 -profile test,docker --outdir test_results | ||
``` | ||
|
||
## Running airrflow pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 could you also add the output message here?
|
||
> [Tip] | ||
> When launching a Nextflow pipeline with the `-resume` option, any processes that have already been run with the exact same code, settings and inputs will be cached and the pipeline will resume from the last step that changed or failed with an error. The benefit of using `-resume` is to avoid duplicating previous work and save time when re-running a pipeline. | ||
> We include `-resume` in our Nextflow command as a precaution in case anything goes wrong during execution. After fixing the issue, you can relaunch the pipeline with the same command, it will resume running from the point of failure, significantly reducing runtime and resource usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 could you add this here as well?
- Pipeline_info report: various reports relevant to the running and execution of the pipeline. | ||
- Report_file_size report: Summary of the number of sequences left after each of the most important pipeline steps. | ||
|
||
## Understanding error messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 I just thought of a new section to explain how to understand error messages and facilitate debugging. What do you think?
|
||
- A configuration file requiring memory, cpu and time. Before setting the configuration file, we recommend verifying the available memory and cpus on your system. Otherwise, exceeding the system's capacity may result in unexpected errors. | ||
|
||
- Information on bulk library generation method(protocol). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 it would be great to also provide the samplesheet and configuration file example for this tutorial for download.
|
||
After launching the pipeline the following will be printed to the console output: | ||
|
||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 Could you provide here the console output examples for the tutorial?
|
||
After running the pipeline, several reports are generated under the result folder. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 Here as well to make it easier to maintain I would just list the output directories as a text list here.
The analysis steps and their corresponding folders, where the results are stored, are listed below. | ||
|
||
|
||
1. QC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vivian0105 I would focus in this section on pointing to the reports for each of these analysis steps that you are mentioning here and what they contain.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).