-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
df42d93
commit 74e05b8
Showing
1 changed file
with
167 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
|
||
## Original Mytax v1.0 | ||
|
||
|
||
This is the repository for mytax, a tool for building custom taxonomies, which can aid nucleotide sequence classification. | ||
|
||
# Installation | ||
|
||
Clone this repo with: | ||
|
||
`git clone https://github.com/jhuapl-bio/mytax` | ||
|
||
Symbolically link all shell scripts into your path, for example with: | ||
|
||
`find v1 -name "*.sh" | while read fn; do sudo ln -s $PWD/$fn /usr/local/bin; done` | ||
|
||
# Dependencies | ||
|
||
- jellyfish (version 1) - https://www.cbcb.umd.edu/software/jellyfish/ | ||
- kraken (version 1) - https://ccb.jhu.edu/software/kraken/ | ||
- gawk - https://www.gnu.org/software/gawk/manual/html_node/Installation.html | ||
- perl | ||
- GNU CoreUtils | ||
|
||
- 16 GB of RAM is needed to build the provided influenza kraken database | ||
|
||
# Usage | ||
|
||
## Building example | ||
|
||
This pipeline is built from a central set of scripts located in the `v1` directory | ||
|
||
Build flu-kraken example with: | ||
|
||
`build_flukraken.sh -k flukraken-$(date +"%F")` | ||
|
||
The single script `build_flukraken.sh` functions as an outer wrapper for the influenza classification example using the Kraken classifier published in the mytax paper. | ||
|
||
|
||
`build_flukraken.sh` can also be used as a model to build modified pipelines as desired. It is built from four main sub-modules: | ||
``` | ||
download_IVR.sh -> download references and taxonomy from IVR | ||
build_IVR_metadata.sh -> build tab-delimited metadata table in format for mytax | ||
build_taxonomy.sh -> build custom taxonomy from tab-delimited table | ||
build_krakendb.sh -> add new taxonomic IDs to reference FASTA, build kraken database, post-process database for visualization pipeline | ||
``` | ||
|
||
`build_krakendb.sh` currently references three helper scripts, which also need to be in the PATH: | ||
``` | ||
fix_references.sh -> adds new taxonomic IDs to reference FASTA | ||
kraken-build -> builds kraken database | ||
process_krakendb.sh -> post-processes database for visualization pipeline (not included in this repo yet) | ||
``` | ||
|
||
|
||
|
||
## Running process script on kraken/kraken2 report and outfiles | ||
|
||
### If running from Docker | ||
|
||
docker build . -t jhuaplbio/mytax | ||
|
||
Unix | ||
|
||
`docker container run -it --rm -v $PWD:/data jhuaplbio/mytax bash` | ||
|
||
Windows Powershell | ||
|
||
`docker container run -it --rm -v $pwd:/data jhuaplbio/mytax bash` | ||
|
||
|
||
|
||
## Run the installation script | ||
|
||
|
||
# Activate the env, this will contain kraken2 and centrifuge scripts to build the database if needed as well as kraken2 and centrifuge dependencies | ||
|
||
`conda activate mytax` | ||
|
||
## Lets make a sample.fastq from test-data | ||
|
||
### First, download ncbi taxdump | ||
|
||
``` | ||
python3 src/generate_hierarchy.py -o $PWD/taxdump --report test-data/sample.report -download | ||
rm taxdump.tar.gz | ||
``` | ||
|
||
|
||
### DEPRECATED Kraken1 | ||
|
||
``` | ||
mkdir -p databases/minikraken1 | ||
wget https://ccb.jhu.edu/software/kraken/dl/minikraken_20171019_4GB.tgz -O databases/minikraken1.tgz | ||
tar -xvzf databases/minikraken1.tgz --directory databases/ | ||
export kraken1db=databases/minikraken_20171013_4GB && \ | ||
kraken --db $kraken1db --output test-data/sample.out test-data/sample.fastq && \ | ||
kraken-report --db $kraken1db test-data/sample.out | tee test-data/sample.report | ||
``` | ||
|
||
|
||
### Kraken2 | ||
|
||
### IF you've made flukraken2 in tmp or.... | ||
|
||
`export KRAKEN2_DEFAULT_DB="tmp/flukraken2` | ||
|
||
### IF you have a pre-made minikraken/other kraken db ready | ||
|
||
``` | ||
kraken2 --report output/sample_metagenome.first.report --output output/sample_metagenome.first.out --memory-mapping --db ~/Desktop/mytax/minikraken2 example-data/sample_metagenome.first.fastq | ||
``` | ||
|
||
### Download minikraken2 | ||
|
||
``` | ||
mkdir -p databases/ | ||
wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz -O databases/minikraken2.tgz | ||
tar -xvzf databases/minikraken2.tgz --directory databases/ | ||
``` | ||
|
||
### Centrifuge | ||
|
||
#### Install | ||
|
||
`bash install.sh` | ||
|
||
#### Set up centrifuge env | ||
|
||
``` | ||
mkdir -p databases/centrifuge | ||
wget https://genome-idx.s3.amazonaws.com/centrifuge/p_compressed%2Bh%2Bv.tar.gz -O databases/centrifuge.tgz | ||
tar -xvzf databases/centrifuge.tgz --directory databases/centrifuge/ | ||
``` | ||
|
||
|
||
|
||
#### Run Centrifuge classify | ||
|
||
``` | ||
## If you need to make a new database, see here: $CONDA_PREFIX/lib/centrifuge/centrifuge-build --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp sample.fastq sample | ||
$CONDA_PREFIX/lib/centrifuge/centrifuge -f -x databases/centrifuge/p_compressed+h+v -q test-data/sample.fastq --report test-data/sample.centrifuge.report > test-data/sample.out | ||
$CONDA_PREFIX/lib/centrifuge/centrifuge-kreport -x databases/centrifuge/p_compressed+h+v test-data/sample.centrifuge.report > test-data/sample.report | ||
``` | ||
|
||
#### Next, generate the hierarchy json file | ||
|
||
``` | ||
python3 server/src/generate_hierarchy.py \ | ||
-o output/sample_metagenome.first.fullstring \ | ||
--report output/sample_metagenome.first.report \ | ||
-taxdump taxonomy/nodes.dmp | ||
``` | ||
|
||
#### Get the json for mytax sunburst plot | ||
``` | ||
bash server/src/krakenreport2json.sh -i output/sample_metagenome.first.fullstring -o output/sample_metagenome.first.json | ||
``` | ||
|
||
The resulting file can then imported into the sunburst plot at `server/src/sunburst/index.html` rendered with a simple `http.server` protocol like `python3 -m http.server 8080` |