-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially serious problem with read group definitions #308
Comments
Additionally, for my |
This is a good point; with some discussion, we'll want to add something like:
|
Yes, this is what dataset registration is trying to mitigate (only internally). There are many different FASTQ formats and it's hard to accommodate everything but it would be useful to add some checks/automation. Also, this pipeline adds |
Generally, yes, the error align-DNA ends up running into is a filename collision since the initial alignments are named based on a combination of library and lane |
I think you meant this line -
Then, yes, this should be updated. Internally registered FASTQs should be fine as they should have unique lane info for each library. |
pipeline-align-DNA/main.nf
Line 71 in 76c63a1
Library
,sample
andlane
are not sufficient to identify unique read groups. The same sample from the same library can be sequenced on two different machine runs and have the same lane. In this case the pipeline yields an error, but the user is likely to attempt to get around it by renaming the libraries to distinct names. This can lead to massive numbers of false positives, dependent on background duplication rate, due to failure to mark duplicates across the two runs.The text was updated successfully, but these errors were encountered: