Skip to content

Commit 7f3b758

Browse files
jkbonfieldwhitwham
authored andcommitted
Convert U to T instead of U to N when sam_parsing.
As this changes seq_nt16_table it changes it for all other uses of this lookup table too, which will be widespread. However this feels like a reasonable thing to do given it only has an impact on data which is currently out of bounds of what is expected. Fixes samtools/samtools#2131
1 parent 3c90481 commit 7f3b758

File tree

4 files changed

+12
-2
lines changed

4 files changed

+12
-2
lines changed

hts.c

+5-2
Original file line numberDiff line numberDiff line change
@@ -232,16 +232,19 @@ const char *hts_feature_string(void) {
232232
}
233233

234234

235+
// Converts ASCII to BAM nibble encoding.
236+
// Note 0123 is treated as ACGT (ABI colourspace encoding) and
237+
// U is treated as T.
235238
HTSLIB_EXPORT
236239
const unsigned char seq_nt16_table[256] = {
237240
15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
238241
15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
239242
15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
240243
1, 2, 4, 8, 15,15,15,15, 15,15,15,15, 15, 0 /*=*/,15,15,
241244
15, 1,14, 2, 13,15,15, 4, 11,15,15,12, 15, 3,15,15,
242-
15,15, 5, 6, 8,15, 7, 9, 15,10,15,15, 15,15,15,15,
245+
15,15, 5, 6, 8, 8, 7, 9, 15,10,15,15, 15,15,15,15,
243246
15, 1,14, 2, 13,15,15, 4, 11,15,15,12, 15, 3,15,15,
244-
15,15, 5, 6, 8,15, 7, 9, 15,10,15,15, 15,15,15,15,
247+
15,15, 5, 6, 8, 8, 7, 9, 15,10,15,15, 15,15,15,15,
245248

246249
15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
247250
15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,

htslib/hts.h

+1
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,7 @@ int hts_parse_opt_list(htsFormat *opt, const char *str);
455455
The input character may be either an IUPAC ambiguity code, '=' for 0, or
456456
'0'/'1'/'2'/'3' for a result of 1/2/4/8. The result is encoded as 1/2/4/8
457457
for A/C/G/T or combinations of these bits for ambiguous bases.
458+
Additionally RNA U is treated as a T (8).
458459
*/
459460
HTSLIB_EXPORT
460461
extern const unsigned char seq_nt16_table[256];

test/compare_sam.pl

+3
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,9 @@
163163
$ln1[9] = uc($ln1[9]);
164164
$ln2[9] = uc($ln2[9]);
165165

166+
# RNA U to T is an expected change
167+
$ln1[9] =~ s/U/T/g;
168+
166169
# Cram will populate a sequence string that starts as "*"
167170
$ln2[9] = "*" if ($ln1[9] eq "*");
168171

test/xx#u.sam

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
@SQ SN:xx LN:20
2+
a1 99 xx 1 1 16M = 11 20 =ACMGRSVTWYHKDBN ****************
3+
b1 99 xx 1 1 16M = 11 20 =ACMGRSVUWYHKDBN ****************

0 commit comments

Comments
 (0)