|
| 1 | + |
| 2 | +## Original Mytax v1.0 |
| 3 | + |
| 4 | + |
| 5 | +This is the repository for mytax, a tool for building custom taxonomies, which can aid nucleotide sequence classification. |
| 6 | + |
| 7 | +# Installation |
| 8 | + |
| 9 | +Clone this repo with: |
| 10 | + |
| 11 | +`git clone https://github.com/jhuapl-bio/mytax` |
| 12 | + |
| 13 | +Symbolically link all shell scripts into your path, for example with: |
| 14 | + |
| 15 | +`find v1 -name "*.sh" | while read fn; do sudo ln -s $PWD/$fn /usr/local/bin; done` |
| 16 | + |
| 17 | +# Dependencies |
| 18 | + |
| 19 | + - jellyfish (version 1) - https://www.cbcb.umd.edu/software/jellyfish/ |
| 20 | + - kraken (version 1) - https://ccb.jhu.edu/software/kraken/ |
| 21 | + - gawk - https://www.gnu.org/software/gawk/manual/html_node/Installation.html |
| 22 | + - perl |
| 23 | + - GNU CoreUtils |
| 24 | + |
| 25 | + - 16 GB of RAM is needed to build the provided influenza kraken database |
| 26 | + |
| 27 | +# Usage |
| 28 | + |
| 29 | +## Building example |
| 30 | + |
| 31 | +This pipeline is built from a central set of scripts located in the `v1` directory |
| 32 | + |
| 33 | +Build flu-kraken example with: |
| 34 | + |
| 35 | +`build_flukraken.sh -k flukraken-$(date +"%F")` |
| 36 | + |
| 37 | +The single script `build_flukraken.sh` functions as an outer wrapper for the influenza classification example using the Kraken classifier published in the mytax paper. |
| 38 | + |
| 39 | + |
| 40 | +`build_flukraken.sh` can also be used as a model to build modified pipelines as desired. It is built from four main sub-modules: |
| 41 | +``` |
| 42 | + download_IVR.sh -> download references and taxonomy from IVR |
| 43 | +
|
| 44 | + build_IVR_metadata.sh -> build tab-delimited metadata table in format for mytax |
| 45 | +
|
| 46 | + build_taxonomy.sh -> build custom taxonomy from tab-delimited table |
| 47 | +
|
| 48 | + build_krakendb.sh -> add new taxonomic IDs to reference FASTA, build kraken database, post-process database for visualization pipeline |
| 49 | +``` |
| 50 | + |
| 51 | +`build_krakendb.sh` currently references three helper scripts, which also need to be in the PATH: |
| 52 | +``` |
| 53 | + fix_references.sh -> adds new taxonomic IDs to reference FASTA |
| 54 | +
|
| 55 | + kraken-build -> builds kraken database |
| 56 | +
|
| 57 | + process_krakendb.sh -> post-processes database for visualization pipeline (not included in this repo yet) |
| 58 | +``` |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +## Running process script on kraken/kraken2 report and outfiles |
| 63 | + |
| 64 | +### If running from Docker |
| 65 | + |
| 66 | +docker build . -t jhuaplbio/mytax |
| 67 | + |
| 68 | +Unix |
| 69 | + |
| 70 | +`docker container run -it --rm -v $PWD:/data jhuaplbio/mytax bash` |
| 71 | + |
| 72 | +Windows Powershell |
| 73 | + |
| 74 | +`docker container run -it --rm -v $pwd:/data jhuaplbio/mytax bash` |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | +## Run the installation script |
| 79 | + |
| 80 | + |
| 81 | +# Activate the env, this will contain kraken2 and centrifuge scripts to build the database if needed as well as kraken2 and centrifuge dependencies |
| 82 | + |
| 83 | +`conda activate mytax` |
| 84 | + |
| 85 | +## Lets make a sample.fastq from test-data |
| 86 | + |
| 87 | +### First, download ncbi taxdump |
| 88 | + |
| 89 | +``` |
| 90 | +python3 src/generate_hierarchy.py -o $PWD/taxdump --report test-data/sample.report -download |
| 91 | +rm taxdump.tar.gz |
| 92 | +``` |
| 93 | + |
| 94 | + |
| 95 | +### DEPRECATED Kraken1 |
| 96 | + |
| 97 | +``` |
| 98 | +mkdir -p databases/minikraken1 |
| 99 | +wget https://ccb.jhu.edu/software/kraken/dl/minikraken_20171019_4GB.tgz -O databases/minikraken1.tgz |
| 100 | +tar -xvzf databases/minikraken1.tgz --directory databases/ |
| 101 | +
|
| 102 | +export kraken1db=databases/minikraken_20171013_4GB && \ |
| 103 | +kraken --db $kraken1db --output test-data/sample.out test-data/sample.fastq && \ |
| 104 | +kraken-report --db $kraken1db test-data/sample.out | tee test-data/sample.report |
| 105 | +``` |
| 106 | + |
| 107 | + |
| 108 | +### Kraken2 |
| 109 | + |
| 110 | +### IF you've made flukraken2 in tmp or.... |
| 111 | + |
| 112 | +`export KRAKEN2_DEFAULT_DB="tmp/flukraken2` |
| 113 | + |
| 114 | +### IF you have a pre-made minikraken/other kraken db ready |
| 115 | + |
| 116 | +``` |
| 117 | +kraken2 --report output/sample_metagenome.first.report --output output/sample_metagenome.first.out --memory-mapping --db ~/Desktop/mytax/minikraken2 example-data/sample_metagenome.first.fastq |
| 118 | +``` |
| 119 | + |
| 120 | +### Download minikraken2 |
| 121 | + |
| 122 | +``` |
| 123 | +mkdir -p databases/ |
| 124 | +wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz -O databases/minikraken2.tgz |
| 125 | +tar -xvzf databases/minikraken2.tgz --directory databases/ |
| 126 | +``` |
| 127 | + |
| 128 | +### Centrifuge |
| 129 | + |
| 130 | +#### Install |
| 131 | + |
| 132 | +`bash install.sh` |
| 133 | + |
| 134 | +#### Set up centrifuge env |
| 135 | + |
| 136 | +``` |
| 137 | +mkdir -p databases/centrifuge |
| 138 | +wget https://genome-idx.s3.amazonaws.com/centrifuge/p_compressed%2Bh%2Bv.tar.gz -O databases/centrifuge.tgz |
| 139 | +tar -xvzf databases/centrifuge.tgz --directory databases/centrifuge/ |
| 140 | +``` |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | +#### Run Centrifuge classify |
| 145 | + |
| 146 | +``` |
| 147 | +## If you need to make a new database, see here: $CONDA_PREFIX/lib/centrifuge/centrifuge-build --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp sample.fastq sample |
| 148 | +
|
| 149 | +$CONDA_PREFIX/lib/centrifuge/centrifuge -f -x databases/centrifuge/p_compressed+h+v -q test-data/sample.fastq --report test-data/sample.centrifuge.report > test-data/sample.out |
| 150 | +$CONDA_PREFIX/lib/centrifuge/centrifuge-kreport -x databases/centrifuge/p_compressed+h+v test-data/sample.centrifuge.report > test-data/sample.report |
| 151 | +``` |
| 152 | + |
| 153 | +#### Next, generate the hierarchy json file |
| 154 | + |
| 155 | +``` |
| 156 | +python3 server/src/generate_hierarchy.py \ |
| 157 | +-o output/sample_metagenome.first.fullstring \ |
| 158 | +--report output/sample_metagenome.first.report \ |
| 159 | +-taxdump taxonomy/nodes.dmp |
| 160 | +``` |
| 161 | + |
| 162 | +#### Get the json for mytax sunburst plot |
| 163 | +``` |
| 164 | +bash server/src/krakenreport2json.sh -i output/sample_metagenome.first.fullstring -o output/sample_metagenome.first.json |
| 165 | +``` |
| 166 | + |
| 167 | +The resulting file can then imported into the sunburst plot at `server/src/sunburst/index.html` rendered with a simple `http.server` protocol like `python3 -m http.server 8080` |
0 commit comments