|
| 1 | +# MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing |
| 2 | + |
| 3 | +## What is MegIS? |
| 4 | + |
| 5 | +MegIS is the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. We address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) data mapping, and 5) lightweight in-storage accelerators. MegIS's design is flexible, capable of supporting different types of metagenomic input datasets, and can be integrated into various metagenomic analysis pipelines. |
| 6 | + |
| 7 | + |
| 8 | +<p align="center"> |
| 9 | + <img src="megis-overview.png" alt="drawing" width="400"/> |
| 10 | +</p> |
| 11 | + |
| 12 | + |
| 13 | +## Citation |
| 14 | +If you find this repo useful, please cite the following paper: |
| 15 | + |
| 16 | +Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu, |
| 17 | +["MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing"](https://arxiv.org/pdf/2406.19113) |
| 18 | +ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 2024. |
| 19 | + |
| 20 | +```bibtex |
| 21 | +@inproceedings{ghiasi2024megis, |
| 22 | + title={MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing}, |
| 23 | + author={Ghiasi, Nika Mansouri and Sadrosadati, Mohammad and Mustafa, Harun and Gollwitzer, Arvid and Firtina, Can and Eudine, Julien and Mao, Haiyu and Lindegger, Jo{\"e}l and Cavlak, Meryem Banu and Alser, Mohammed and Park, Jisung and Mutlu, Onur}, |
| 24 | + booktitle={2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)}, |
| 25 | + pages={660--677}, |
| 26 | + year={2024}, |
| 27 | + organization={IEEE} |
| 28 | +} |
| 29 | +``` |
| 30 | + |
| 31 | +## Table of Contents |
| 32 | + |
| 33 | + * [What is MegIS?](#what-is-megis-) |
| 34 | + * [Citation](#citation) |
| 35 | + * [Prerequisites](#prerequisites) |
| 36 | + * [Input Data](#input-data) |
| 37 | + + [Query Read Sets](#query-read-sets) |
| 38 | + + [Database](#database) |
| 39 | + * [Preparing the Input Queries](#preparing-the-input-queries) |
| 40 | + * [Finding Candidate Species](#finding-candidate-species) |
| 41 | + * [End-to-end Throughput](#end-to-end-throughput) |
| 42 | + * [Contact](#contact) |
| 43 | + |
| 44 | + |
| 45 | +## Prerequisites |
| 46 | + |
| 47 | +The infrastructure has been tested with the following system configuration: |
| 48 | + * g++ v11.1.0 |
| 49 | + * Python v3.9.12 |
| 50 | + |
| 51 | +Prerequisites specific to each experiment are listed in their respective subsections. |
| 52 | + |
| 53 | + |
| 54 | +## Input Data |
| 55 | + |
| 56 | +### Query Read Sets |
| 57 | + |
| 58 | +The read sets used in the paper are from the commonly-used [CAMI benchmark](https://www.nature.com/articles/nmeth.4458). They can be obtained from [this link](http://gigadb.org/dataset/100344). |
| 59 | + |
| 60 | +### Database |
| 61 | + |
| 62 | +In our paper, we generate a database based on microbial genomes drawn from NCBI’s databases, including 155,442 genomes for 52,961 microbial species. You can download the genomes by running the script `input-data/download_genomes.sh`. This also creates a list of the files in `input-data/database_genomes.txt`. |
| 63 | + |
| 64 | +After compiling KMC, build the database from the input files by running the following commands |
| 65 | +```bash |
| 66 | +# create a local scratch directory |
| 67 | +mkdir -p kmc_tmp |
| 68 | + |
| 69 | +# create an initial database |
| 70 | +kmc -k60 -fa -ci0 -cs3 -t$NUM_THREADS @input-data/database_genomes.txt ${OUT_FILE}_pre kmc_tmp |
| 71 | + |
| 72 | +# sort k-mers in the database |
| 73 | +kmc_tools transform ${OUT_FILE}_pre sort $OUT_FILE |
| 74 | + |
| 75 | +# remove the initial database and the scratch directory |
| 76 | +rmdir kmc_tmp |
| 77 | +rm ${OUT_FILE}_pre |
| 78 | +``` |
| 79 | +replacing `$OUT_FILE` with the desired output name and `$NUM_THREADS` with the desired number of CPU threads. |
| 80 | + |
| 81 | +## Preparing the Input Queries |
| 82 | + |
| 83 | +To extract k-mers from input queries, MegIS includes a new input processing scheme by improving upon the input processing scheme in [KMC](https://github.com/refresh-bio/KMC). MegIS enables overlapping the k-mer sorting and transfer of a bucket to the SSD with the in-storage processing operations of the [next step](#finding-candidate-species) on the previously transferred buckets. The overlapping of different pipeline stages is modeled in the `pipeline/pipeline_throughput.py`. |
| 84 | + |
| 85 | +In `preparing-input-queries`, we include an optimized version of KMC as a software baseline. This version improves execution time by utilizing a fixed prefix length for query and database k-mers. For a fair comparison, we apply the same optimization (excluding the in-storage processing overlap used in MegIS) when evaluating the baseline software tool, A-Opt, in our experiments. |
| 86 | + |
| 87 | +To prepare a query, run the following commands |
| 88 | +```bash |
| 89 | +# create a local scratch directory |
| 90 | +mkdir -p kmc_tmp |
| 91 | + |
| 92 | +# create an initial k-mer counter |
| 93 | +kmc -k60 -fq -ci2 -cs3 -t$NUM_THREADS $IN_FILE ${OUT_FILE}_pre kmc_tmp |
| 94 | + |
| 95 | +# sort k-mers in the k-mer counter |
| 96 | +kmc_tools transform ${OUT_FILE}_pre sort $OUT_FILE |
| 97 | + |
| 98 | +# remove the initial k-mer counter and the scratch directory |
| 99 | +rmdir kmc_tmp |
| 100 | +rm ${OUT_FILE}_pre |
| 101 | +``` |
| 102 | +replacing `$IN_FILE` with the path to the input query file, `$OUT_FILE` with the desired output name, and `$NUM_THREADS` with the desired number of CPU threads. If the input file is in FASTA format, use the `-fa` flag when creating the initial k-mer counter. |
| 103 | + |
| 104 | +## Finding Candidate Species |
| 105 | + |
| 106 | + |
| 107 | +MegIS runs this step with in-storage process accelerators. The Verilog implementations of MegIS's lightweight hardware units are in `hdl/`. |
| 108 | + |
| 109 | +To ensure a fair comparison with software baselines, we incorporate optimizations that enhance the utilization of the SSD's I/O bandwidth during the process of identifying intersecting k-mers in software. We include the optimized implementation for intersection finding in `finding-candidate-species/` and use this optimization when evaluating the A-Opt baseline. Given a database `$DB_FILE` and query `$QUERY_FILE` (both excluding the `.kmc_suf` extension), their intersection can be computed using the `intersection` executable in `finding-candidate-species`. It may be compiled by running `make` in its directory. Afterwards, compute an intersection as follows |
| 110 | +```bash |
| 111 | +intersection $DB_FILE $QUERY_FILE $NUM_THREADS $INTERSECTION_OUT |
| 112 | +``` |
| 113 | +where `$NUM_THREADS` is the desired number of threads and `$INTERSECTION_OUT` is the name of the output. |
| 114 | + |
| 115 | +## End-to-End Throughput |
| 116 | + |
| 117 | +To find the end-to-end throughput of the `pipeline`, we incorporate the latency and throughput of all of MegIS's components, including host operations, accessing flash chips, internal DRAM, in-storage accelerator, and host-SSD interfaces. |
| 118 | + |
| 119 | +For the components in the hardware-based steps (e.g., finding candidate species): We implement MegIS’s logic components in Verilog. We use two state-of-the-art simulators, |
| 120 | +[Ramulator](https://github.com/CMU-SAFARI/ramulator) to model SSD’s internal DRAM, and [MQSim](https://github.com/CMU-SAFARI/MQSim) to model SSD’s internal operations. |
| 121 | + |
| 122 | +For the components in the software-based step (e.g., host operations for preparing the input queries), we measure performance on a real system, an AMD® EPYC® 7742 CPU with 128 physical cores and 1-TB DRAM. For the software baselines, we measure performance on this real system, with best-performing thread counts. |
| 123 | + |
| 124 | + |
| 125 | +## Contact |
| 126 | + |
| 127 | +Nika Mansouri Ghiasi - [email protected] |
0 commit comments