Skip to content

1.12

Compare
Choose a tag to compare
@daviesrob daviesrob released this 17 Mar 16:20
· 605 commits to develop since this release
1.12

Download the source code here: htslib-1.12.tar.bz2.
(The "Source code" downloads are generated by GitHub and are incomplete as they are missing some generated files.)

Features and Updates

  • Added experimental CRAM 3.1 and 4.0 support. (#929)
    These should not be used for long term data storage as the specification still needs to be ratified by GA4GH and may be subject to changes in format. (This is highly likely for 4.0). However it may be tested using:

    test/test_view -t ref.fa -C -o version=3.1 in.bam -p out31.cram
    

    For smaller but slower files, try varying the compression profile with an additional -o small. Profile choices are fast, normal, small and archive, and can be applied to all CRAM versions.

  • Added a general filtering syntax for alignment records in SAM/BAM/CRAM readers. (#1181, #1203)
    An example to find chromosome spanning read-pairs with high mapping quality: 'mqual >= 30 && mrname != rname'
    To find significant sized deletions: 'cigar =~ "[0-9]{2}D"' or 'rlen - qlen > 10'.
    To report duplicates that aren't part of a "proper pair": 'flag.dup && !flag.proper_pair'
    More details are in the samtools.1 man page under "FILTER EXPRESSIONS".

  • The knet networking code has been removed. It only supported the http and ftp protocols, and a better and safer alternative using libcurl has been available since release 1.3. If you need access to ftp:// and http:// URLs, HTSlib should be built with libcurl support. (#1200)

  • The old htslib/knetfile.h interfaces have been marked as deprecated. Any code still using them should be updated to use hFILE instead. (#1200)

  • Added an introspection API for checking some of the capabilities provided by HTSlib. (#1170) Thanks also to John Marshall for contributions. (#1222)

    • hfile_list_schemes: returns the number of schemes found
    • hfile_list_plugins: returns the number of plugins found
    • hfile_has_plugin: checks if a specific plugin is available
    • hts_features: returns a bit mask with all available features
    • hts_test_feature: test if a feature is available
    • hts_feature_string: return a string summary of enabled features
  • Made performance improvements to probaln_glocal method, which speeds up mpileup BAQ calculations. (#1188)

    • Caching of reused loop variables and removal of loop invariants
    • Code reordering to remove instruction latency.
    • Other refactoring and tidyups.
  • Added a public method for constructing a BAM record from the component pieces. Thanks to Anders Kaplan. (#1159, #1164)

  • Added two public methods, sam_parse_cigar and bam_parse_cigar, as part of a small CIGAR API (#1169, #1182). Thanks to Daniel Cameron for input. (#1147)

  • HTSlib, and the included htsfile program, will now recognise the old RAZF compressed file format. Note that while the format is detected, HTSlib is unable to read it. It is recommended that RAZF files are uncompressed with gunzip before using them with HTSlib. Thanks to John Marshall (#1244); and Matthew J. Oldach who reported problems with uncompressing some RAZF files (samtools/samtools#1387).

  • The S3 plugin now has options to force the address style. It will recognise the addressing_style and host_bucket entries in the respective AWS .credentials and s3cmd .s3cfg files. There is also a new HTS_S3_ADDRESS_STYLE environment variable. Details are in the htslib-s3-plugin.7 man file (#1249).

Build changes

These are compiler, configuration and makefile based changes.

  • Added new Makefile targets for the applications that embed HTSlib and want to run its test suite or clean its generated artefacts. (#1230, #1238)

  • The CRAM codecs are now obtained via the htscodecs submodule, hence when cloning it is now best to use git clone --recursive. In an existing clone, you may use git submodule update --init to obtain the htscodecs submodule checkout.

  • Updated CI test configuration to recurse HTSlib submodules. (#1359)

  • Added Cirrus-CI integration as a replacement for Travis, which was phased out. (#1175; #1212)

  • Updated the Windows image used by Appveyor to 'Visual Studio 2019'. (#1172; fixed #1166)

  • Fixed a buglet in configure.ac, exposed by the release 2.70 of autoconf. Thanks to John Marshall. (#1198)

  • Fixed plugin linking on macOS, to prevent symbol conflict when linking with a static HTSlib. Thanks to John Marshall. (#1184)

  • Fixed a clang++9 error in cram_io.h. Thanks to Pjotr Prins. (#1190)

  • Introduced $(ALL_CPPFLAGS) to allow for more flexibility in setting the compiler flags. Thanks to John Marshall. (#1187)

  • Added 'fall through' comments to prevent warnings issued by Clang on intentional fall through case statements, when building with -Wextra flag. Thanks to John Marshall. (#1163)

  • Non-configure builds now define _XOPEN_SOURCE=600 to allow them to work when the gcc -std=c99 option is used. Thanks to John Marshall. (#1246)

Bug fixes

  • Fixed VCF #CHROM header parsing to only separate columns at tab characters. Thanks to Sam Morris for reporting the issue. (#1237; fixed samtools/bcftools#1408)

  • Fixed a crash reported in bcf_sr_sort_set, which expects REF to be present. (#1204; fixed samtools/bcftools#1361)

  • Fixed a bcf synced reader bug when filtering with a region list, and the first record for a chromosome had the same position as the last record for the previous chromosome. (#1254; fixed samtools/bcftools#1441)

  • Fixed a bug in the overlapping logic of mpileup, dealing with iterating over CIGAR segments. Thanks to @wulj2 for the analysis. (#1202; fixed #1196)

  • Fixed a tabix bug that prevented setting the correct number of lines to be skipped in a region file. Thanks to Jim Robinson for reporting it. (#1189; fixed #1186)

  • Made bam_itr_next an alias for sam_itr_next, to prevent it from crashing when working with htsFile pointers. Thanks to Torbjörn Klatt for reporting it. (#1180; fixed #1179)

  • Fixed once per outgoing multi-threaded block bgzf_idx_flush assertion, to accommodate situations when a single record could span multiple blocks. Thanks to @lacek. (#1168; fixed samtools/samtools#1328)

  • Fixed assumption of pthread_t being a non-structure, as permitted by POSIX. Thanks also to John Marshall and Anders Kaplan. (#1167, #1153, #1153)

  • Fixed the minimum offset of a BAI index bin, to account for unmapped reads. Thanks to John Marshall for spotting the issue. (#1158; fixed #1142)

  • Fixed the CRLF handling in sam_parse_worker method. Thanks to Anders Kaplan. (#1149; fixed #1148)

  • Included unistd.h and errno.h directly in HTSlib files, as opposed to including them indirectly, via third party code. Thanks to Andrew Patterson (#1143) and John Marshall (#1145).