Skip to content

Commit 57b8ee3

Browse files
Update README.md
1 parent 74e05b8 commit 57b8ee3

File tree

1 file changed

+0
-167
lines changed

1 file changed

+0
-167
lines changed

README.md

-167
Original file line numberDiff line numberDiff line change
@@ -107,173 +107,6 @@ mv minikraken2_v2_8GB_201904_UPDATE data/databases/
107107
docker container run -it --rm -p 8098:80 jhuaplbio/basestack_mytax2 bash -c "nginx; bash "
108108
```
109109

110-
## Original Mytax v1.0
111-
112-
113-
This is the repository for mytax, a tool for building custom taxonomies, which can aid nucleotide sequence classification.
114-
115-
# Installation
116-
117-
Clone this repo with:
118-
119-
`git clone https://github.com/jhuapl-bio/mytax`
120-
121-
Symbolically link all shell scripts into your path, for example with:
122-
123-
`find v1 -name "*.sh" | while read fn; do sudo ln -s $PWD/$fn /usr/local/bin; done`
124-
125-
# Dependencies
126-
127-
- jellyfish (version 1) - https://www.cbcb.umd.edu/software/jellyfish/
128-
- kraken (version 1) - https://ccb.jhu.edu/software/kraken/
129-
- gawk - https://www.gnu.org/software/gawk/manual/html_node/Installation.html
130-
- perl
131-
- GNU CoreUtils
132-
133-
- 16 GB of RAM is needed to build the provided influenza kraken database
134-
135-
# Usage
136-
137-
## Building example
138-
139-
This pipeline is built from a central set of scripts located in the `v1` directory
140-
141-
Build flu-kraken example with:
142-
143-
`build_flukraken.sh -k flukraken-$(date +"%F")`
144-
145-
The single script `build_flukraken.sh` functions as an outer wrapper for the influenza classification example using the Kraken classifier published in the mytax paper.
146-
147-
148-
`build_flukraken.sh` can also be used as a model to build modified pipelines as desired. It is built from four main sub-modules:
149-
```
150-
download_IVR.sh -> download references and taxonomy from IVR
151-
152-
build_IVR_metadata.sh -> build tab-delimited metadata table in format for mytax
153-
154-
build_taxonomy.sh -> build custom taxonomy from tab-delimited table
155-
156-
build_krakendb.sh -> add new taxonomic IDs to reference FASTA, build kraken database, post-process database for visualization pipeline
157-
```
158-
159-
`build_krakendb.sh` currently references three helper scripts, which also need to be in the PATH:
160-
```
161-
fix_references.sh -> adds new taxonomic IDs to reference FASTA
162-
163-
kraken-build -> builds kraken database
164-
165-
process_krakendb.sh -> post-processes database for visualization pipeline (not included in this repo yet)
166-
```
167-
168-
169-
170-
## Running process script on kraken/kraken2 report and outfiles
171-
172-
### If running from Docker
173-
174-
docker build . -t jhuaplbio/mytax
175-
176-
Unix
177-
178-
`docker container run -it --rm -v $PWD:/data jhuaplbio/mytax bash`
179-
180-
Windows Powershell
181-
182-
`docker container run -it --rm -v $pwd:/data jhuaplbio/mytax bash`
183-
184-
185-
186-
## Run the installation script
187-
188-
189-
# Activate the env, this will contain kraken2 and centrifuge scripts to build the database if needed as well as kraken2 and centrifuge dependencies
190-
191-
`conda activate mytax`
192-
193-
## Lets make a sample.fastq from test-data
194-
195-
### First, download ncbi taxdump
196-
197-
```
198-
python3 src/generate_hierarchy.py -o $PWD/taxdump --report test-data/sample.report -download
199-
rm taxdump.tar.gz
200-
```
201-
202-
203-
### DEPRECATED Kraken1
204-
205-
```
206-
mkdir -p databases/minikraken1
207-
wget https://ccb.jhu.edu/software/kraken/dl/minikraken_20171019_4GB.tgz -O databases/minikraken1.tgz
208-
tar -xvzf databases/minikraken1.tgz --directory databases/
209-
210-
export kraken1db=databases/minikraken_20171013_4GB && \
211-
kraken --db $kraken1db --output test-data/sample.out test-data/sample.fastq && \
212-
kraken-report --db $kraken1db test-data/sample.out | tee test-data/sample.report
213-
```
214-
215-
216-
### Kraken2
217-
218-
### IF you've made flukraken2 in tmp or....
219-
220-
`export KRAKEN2_DEFAULT_DB="tmp/flukraken2`
221-
222-
### IF you have a pre-made minikraken/other kraken db ready
223-
224-
```
225-
kraken2 --report output/sample_metagenome.first.report --output output/sample_metagenome.first.out --memory-mapping --db ~/Desktop/mytax/minikraken2 example-data/sample_metagenome.first.fastq
226-
```
227-
228-
### Download minikraken2
229-
230-
```
231-
mkdir -p databases/
232-
wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904.tgz -O databases/minikraken2.tgz
233-
tar -xvzf databases/minikraken2.tgz --directory databases/
234-
```
235-
236-
### Centrifuge
237-
238-
#### Install
239-
240-
`bash install.sh`
241-
242-
#### Set up centrifuge env
243-
244-
```
245-
mkdir -p databases/centrifuge
246-
wget https://genome-idx.s3.amazonaws.com/centrifuge/p_compressed%2Bh%2Bv.tar.gz -O databases/centrifuge.tgz
247-
tar -xvzf databases/centrifuge.tgz --directory databases/centrifuge/
248-
```
249-
250-
251-
252-
#### Run Centrifuge classify
253-
254-
```
255-
## If you need to make a new database, see here: $CONDA_PREFIX/lib/centrifuge/centrifuge-build --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp sample.fastq sample
256-
257-
$CONDA_PREFIX/lib/centrifuge/centrifuge -f -x databases/centrifuge/p_compressed+h+v -q test-data/sample.fastq --report test-data/sample.centrifuge.report > test-data/sample.out
258-
$CONDA_PREFIX/lib/centrifuge/centrifuge-kreport -x databases/centrifuge/p_compressed+h+v test-data/sample.centrifuge.report > test-data/sample.report
259-
```
260-
261-
#### Next, generate the hierarchy json file
262-
263-
```
264-
python3 server/src/generate_hierarchy.py \
265-
-o output/sample_metagenome.first.fullstring \
266-
--report output/sample_metagenome.first.report \
267-
-taxdump taxonomy/nodes.dmp
268-
```
269-
270-
#### Get the json for mytax sunburst plot
271-
```
272-
bash server/src/krakenreport2json.sh -i output/sample_metagenome.first.fullstring -o output/sample_metagenome.first.json
273-
```
274-
275-
The resulting file can then imported into the sunburst plot at `server/src/sunburst/index.html` rendered with a simple `http.server` protocol like `python3 -m http.server 8080`
276-
277110
# License and copyright
278111

279112
Copyright (c) 2019 Thomas Mehoke

0 commit comments

Comments
 (0)