Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified single cell tutorial and initiated bulk tutorial #368

Open
wants to merge 24 commits into
base: tutorials
Choose a base branch
from

Conversation

Vivian0105
Copy link

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/airrflow branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@ggabernet ggabernet self-requested a review March 8, 2025 00:16


## Running airrflow pipeline from two different input formats
There are two acceptable input formats for airrflow single-cell AIRRseq pipeline: AIRR rearrangement or fastq format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 it would be great to show here what is the output message that appears on the console once the tests pass successfully

- If the automatic threshold is unsatisfactory, you can set the threshold manually and re-run the pipeline.
(Tip: use -resume whenever running the Nextflow pipeline to avoid duplicating previous work).
- For TCR data, where somatic hypermutation does not occur, set the clonal_threshold to 0 when running the Airrflow pipeline.
- Once the threshold is established, clones are assigned to the sequences. A variety of tables and plots associated with clonal analysis were added to the folder 'clonal_analysis/define_clones', such as sequences_per_locus_table, sequences_per_c_call_table, sequences_per_constant_region_table,num_clones_table, clone_sizes_table,clone size distribution plot, clonal abundance plot, diversity plot and etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more information on the clonal analysis in a dedicated section now, could you rewrite the section here to just point out to the respective find_threshold report, clonal_analysis report and lineage_threshold reports?


6. Other reporting.
- Additional reports are also generated, including: a multiqc report which summarizes QC metrics across all samples, pipeline_info reports and report_file_size reports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite this part a bit focusing on the different html reports and a general description on what is inside them?

Copy link
Member

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 I've reviewed and edited the single-cell tutorial for now and added also some comments to this review on how it can still be improved. Let me know if you have any questions!

@ggabernet ggabernet self-assigned this Mar 10, 2025
nextflow run nf-core/airrflow -r 4.2.0 -profile test,docker --outdir test_results
```

## Running airrflow pipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 could you also add the output message here?


> [Tip]
> When launching a Nextflow pipeline with the `-resume` option, any processes that have already been run with the exact same code, settings and inputs will be cached and the pipeline will resume from the last step that changed or failed with an error. The benefit of using `-resume` is to avoid duplicating previous work and save time when re-running a pipeline.
> We include `-resume` in our Nextflow command as a precaution in case anything goes wrong during execution. After fixing the issue, you can relaunch the pipeline with the same command, it will resume running from the point of failure, significantly reducing runtime and resource usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 could you add this here as well?

- Pipeline_info report: various reports relevant to the running and execution of the pipeline.
- Report_file_size report: Summary of the number of sequences left after each of the most important pipeline steps.

## Understanding error messages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 I just thought of a new section to explain how to understand error messages and facilitate debugging. What do you think?


- A configuration file requiring memory, cpu and time. Before setting the configuration file, we recommend verifying the available memory and cpus on your system. Otherwise, exceeding the system's capacity may result in unexpected errors.

- Information on bulk library generation method(protocol).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 it would be great to also provide the samplesheet and configuration file example for this tutorial for download.


After launching the pipeline the following will be printed to the console output:

```bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 Could you provide here the console output examples for the tutorial?


After running the pipeline, several reports are generated under the result folder.

![example of result folder](bulk_tutorial_images/AIRRFLOW_BULK_RESULT.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 Here as well to make it easier to maintain I would just list the output directories as a text list here.

The analysis steps and their corresponding folders, where the results are stored, are listed below.


1. QC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vivian0105 I would focus in this section on pointing to the reports for each of these analysis steps that you are mentioning here and what they contain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants