Skip to content

Commit 4b84de8

Browse files
authored
Merge pull request #69 from sbslee/0.37.0-dev
0.37.0 dev
2 parents 40627bc + 6e022af commit 4b84de8

9 files changed

+422
-48
lines changed

.readthedocs.yaml

+6-1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55
# Required
66
version: 2
77

8+
# Set the OS, Python version and other tools you might need
9+
build:
10+
os: ubuntu-22.04
11+
tools:
12+
python: "3.7"
13+
814
# Build documentation in the docs/ directory with Sphinx
915
sphinx:
1016
configuration: docs/conf.py
@@ -15,6 +21,5 @@ sphinx:
1521

1622
# Optionally set the version of Python and requirements required to build your docs
1723
python:
18-
version: 3.7
1924
install:
2025
- requirements: docs/requirements.txt

CHANGELOG.rst

+9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
Changelog
22
*********
33

4+
0.37.0 (2023-09-09)
5+
-------------------
6+
7+
* :issue:`67`: Fix bug in :meth:`pymaf.MafFrame.plot_waterfall` method where ``count=1`` was causing color mismatch.
8+
* Add new submodule ``pychip``.
9+
* Add new method :meth:`common.reverse_complement`.
10+
* Fix bug in :meth:`common.extract_sequence` method where a long DNA sequence output was truncated.
11+
* :issue:`68`: Refresh the variant consequences database from Ensembl VEP. The database's latest update was on May 31, 2021.
12+
413
0.36.0 (2022-08-12)
514
-------------------
615

README.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,6 @@ README
2020
.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
2121
:target: https://anaconda.org/bioconda/fuc/files
2222

23-
.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
24-
:target: https://conda.anaconda.org/bioconda
25-
2623
Introduction
2724
============
2825

@@ -65,6 +62,11 @@ and cite the following article:
6562

6663
Lee et al., 2022. `ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>`__. PLOS ONE.
6764

65+
Support fuc
66+
===========
67+
68+
If you find my work useful, please consider becoming a `sponsor <https://github.com/sponsors/sbslee>`__.
69+
6870
Installation
6971
============
7072

@@ -183,6 +185,7 @@ Below is the list of submodules available in the fuc API:
183185
- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
184186
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. If you are mainly interested in working with depth of coverage data, please check out the pycov submodule which is specifically designed for the task.
185187
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
188+
- **pychip** : The pychip submodule is designed for working with annotation or manifest files from the Axiom (Thermo Fisher Scientific) and Infinium (Illumina) array platforms.
186189
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. The ``pycov.CovFrame`` class also contains many useful plotting methods such as ``CovFrame.plot_region`` and ``CovFrame.plot_uniformity``.
187190
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
188191
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.

docs/api.rst

+7
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Below is the list of submodules available in the fuc API:
1414
- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
1515
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. If you are mainly interested in working with depth of coverage data, please check out the pycov submodule which is specifically designed for the task.
1616
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
17+
- **pychip** : The pychip submodule is designed for working with annotation or manifest files from the Axiom (Thermo Fisher Scientific) and Infinium (Illumina) array platforms.
1718
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation. The ``pycov.CovFrame`` class also contains many useful plotting methods such as ``CovFrame.plot_region`` and ``CovFrame.plot_uniformity``.
1819
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
1920
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.
@@ -48,6 +49,12 @@ fuc.pybed
4849
.. automodule:: fuc.api.pybed
4950
:members:
5051

52+
fuc.pychip
53+
==========
54+
55+
.. automodule:: fuc.api.pychip
56+
:members:
57+
5158
fuc.pycov
5259
=========
5360

docs/create.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,6 @@
4848
.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
4949
:target: https://anaconda.org/bioconda/fuc/files
5050
51-
.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
52-
:target: https://conda.anaconda.org/bioconda
53-
5451
Introduction
5552
============
5653
@@ -93,6 +90,11 @@
9390
9491
Lee et al., 2022. `ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>`__. PLOS ONE.
9592
93+
Support fuc
94+
===========
95+
96+
If you find my work useful, please consider becoming a `sponsor <https://github.com/sponsors/sbslee>`__.
97+
9698
Installation
9799
============
98100

fuc/api/common.py

+59-2
Original file line numberDiff line numberDiff line change
@@ -804,7 +804,12 @@ def parse_variant(variant):
804804

805805
def extract_sequence(fasta, region):
806806
"""
807-
Extract the region's DNA sequence from the FASTA file.
807+
Extract the DNA sequence corresponding to a selected region from a FASTA
808+
file.
809+
810+
The method also allows users to retrieve the reference allele of a
811+
variant in a genomic coordinate format, instead of providing a genomic
812+
region.
808813
809814
Parameters
810815
----------
@@ -817,9 +822,20 @@ def extract_sequence(fasta, region):
817822
-------
818823
str
819824
DNA sequence. Empty string if there is no matching sequence.
825+
826+
Examples
827+
--------
828+
829+
>>> from fuc import common
830+
>>> fasta = 'resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta'
831+
>>> common.extract_sequence(fasta, 'chr1:15000-15005')
832+
'GATCCG'
833+
>>> # rs1423852 is chr16-80874864-C-T
834+
>>> common.extract_sequence(fasta, 'chr16:80874864-80874864')
835+
'C'
820836
"""
821837
try:
822-
sequence = pysam.faidx(fasta, region).split('\n')[1]
838+
sequence = ''.join(pysam.faidx(fasta, region).split('\n')[1:])
823839
except pysam.SamtoolsError as e:
824840
warnings.warn(str(e))
825841
sequence = ''
@@ -1434,3 +1450,44 @@ def parse_list_or_file(obj, extensions=['txt', 'tsv', 'csv', 'list']):
14341450
return convert_file2list(obj[0])
14351451

14361452
return obj
1453+
1454+
def reverse_complement(seq, complement=True, reverse=False):
1455+
"""
1456+
Given a DNA sequence, generate its reverse, complement, or
1457+
reverse-complement.
1458+
1459+
Parameters
1460+
----------
1461+
seq : str
1462+
DNA sequence.
1463+
complement : bool, default: True
1464+
Whether to return the complment.
1465+
reverse : bool, default: False
1466+
Whether to return the reverse.
1467+
1468+
Returns
1469+
-------
1470+
str
1471+
Updated sequence.
1472+
1473+
Examples
1474+
--------
1475+
1476+
>>> from fuc import common
1477+
>>> common.reverse_complement('AGC')
1478+
'TCG'
1479+
>>> common.reverse_complement('AGC', reverse=True)
1480+
'GCT'
1481+
>>> common.reverse_complement('AGC', reverse=True, complement=False)
1482+
'GCT'
1483+
>>> common.reverse_complement('agC', reverse=True)
1484+
'Gct'
1485+
"""
1486+
new_seq = seq[:]
1487+
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A',
1488+
'a': 't', 'c': 'g', 'g': 'c', 't': 'a'}
1489+
if complement:
1490+
new_seq = ''.join([complement[x] for x in new_seq])
1491+
if reverse:
1492+
new_seq = new_seq[::-1]
1493+
return new_seq

0 commit comments

Comments
 (0)