Skip to content

Commit

Permalink
[upd] update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
happierpig committed Dec 29, 2023
1 parent 609e23c commit ffb7512
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 7 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ pip install -r requirements.txt
```
4. Compile kernels benchmarks (Optional): Install gcc-11 and CMake (>= 3.24)
```
apt install software-properties-common lsb-release
apt-get update
curl -s https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
apt update
apt install cmake
cd /PATH_TO_ATOM/kernels
apt install software-properties-common
apt-get update
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get update
apt install -y gcc-11 g++-11
Expand Down
24 changes: 19 additions & 5 deletions kernels/baselines/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,41 @@
# Atom: Baseline Kernel Evaluations
## Environment Setup
We use [NVBench](https://github.com/NVIDIA/nvbench.git) to evaluate the kernel performance and we need [libTorch](https://pytorch.org/) with `_GLIBCXX_USE_CXX11_ABI = 1` to make baselines compatible with NVBench. Follow the instructions below to setup the environment.
We evaluate baselines in CUDA 12.1 to maxmize their performance. Follow the instructions to setup the container.
```
docker pull nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
docker run -it --gpus all nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04 /bin/bash
```
Make sure you install wget, git, conda and cmake (>= 3.24). We use [NVBench](https://github.com/NVIDIA/nvbench.git) to evaluate the kernel performance and we need [libTorch](https://pytorch.org/) with `_GLIBCXX_USE_CXX11_ABI = 1` to make baselines compatible with NVBench. Follow the instructions below to setup the environment.
```
git clone --recurse-submodules https://github.com/efeslab/Atom
wget https://download.pytorch.org/libtorch/cu121/libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu121.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.1.2+cu121.zip
mv libtorch /PATH_TO_ATOM/kernels/3rdparty/
```
Install Python dev to include `Python.h` for torch extension.
```
apt-get install python3-dev
```
Use the following instructions or scripts `build.sh` to build the baseline benchmark.
```
cd /PATH_TO_ATOM/kernels/baselines
mkdir build
cd build
# Fill in your libtorch path
cmake .. -DCMAKE_PREFIX_PATH=/PATH_TO_ATOM/kernels/3rdparty/libtorch
make -j
```
## Result
8-bit Weight-activation Quantization (SmoothQuant) and 4-bit Weight-only Quantization (AWQ) are evaluated in CUDA 12.1 to maximize their performance. Note that `Elem/s` denotes the computation throughput.
8-bit Weight-activation Quantization (SmoothQuant) and 4-bit Weight-only Quantization (AWQ) are evaluated in CUDA 12.1 to maximize their performance. Note that `Elem/s` denotes the computation throughput (Flops/s).

W8A8 Evaluation:
W8A8 Evaluation `./bench_torch_int`:
![SmoothQuant](../../figures/bench_torch_int.png)

W4A16 Evaluation:
W4A16 Evaluation `./bench_awq`:
![AWQ](../../figures/bench_awq.png)

We also use PyTorch Extension to evaluate the performance of PyTorch API Kernel. Baselines are installed according to their official codebases. Please refer to this [notebook](./python-api.ipynb) to check the results. Below is a sample figure:
![PyTorch API](../../figures/python-api.png)
<div align=center>
<img src="../../figures/python-api.png" width="50%" height="50%">
</div>

0 comments on commit ffb7512

Please sign in to comment.