- Install pytorch==2.2.0 with cuda 12.1
- Install flash-attention
- Other packages are listed in mamba_wt103/requirements.txt
To run the WikiText-103 experiments, run:
cd mamba_wt103
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train experiment=wt103/mamba_pos # for positive-eigenvalues scenario
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train experiment=wt103/mamba_neg # for negative-eigenvalues scenario
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train experiment=wt103/mamba_real # for mixed-eigenvalue scenario
- Install pytorch==2.2.0 with cuda 12.1
- Other library are listed in mambavision_imagenet/requirements.txt
- Download ImageNet-1K from here and change the data-dir path in this script accordingly
cd mambavision_imagenet/mambavision
bash train.sh
If you find this code useful in your research, please cite us as:
@misc{vo2025demystifyingtokendynamicsdeep,
title={Demystifying the Token Dynamics of Deep Selective State Space Models},
author={Thieu N Vo and Duy-Tung Pham and Xin T. Tong and Tan Minh Nguyen},
booktitle={International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=qtTIP5Gjc5},
}
This repo is adapted from safari and MambaVision repository.