Latent Compression Learning (LCL)

[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

We introduce the Latent Compression Learning (LCL) to pre-train vision models from scratch with interleaved image-text data. Compared to existing methods (e.g., CLIP, auto-regressive text generation), our proposed LCL is the first to achieve both

Learning vision models from scratch
Training on interleaved image-text data

📈 Results

Pre-training on MMC4 Dataset

Our LCL pre-training significantly outperforms all other methods in the caption tasks and is on par with the best paired pre-training methods on classification and retrieval tasks.

Comparison with OpenCLIP

When both using LAION-400M data, our LCL pre-training achieves similar performance to OpenCLIP. When combined with MMC4 data, our LCL pre-training outperforms OpenCLIP, especially in caption and multi-modal dialogue tasks. For a fair comparison, the total number of images seen during pre-training is 13B.

📦 Pre-trained Checkpoints

model	data	# samples	download
ViT-B/16	LAION-400M	13B	config / ckpt

🛠️ Usage

Install

This code is built upon OpenCLIP, you can refer to their repository for setup.

Load Pre-trained Checkpoints

Here is an example code to load pre-trained checkpoints:

import open_clip

model_name = "LCL_ViT-B-16_laion"
pretrained = "path to the `.pt` file"

model = open_clip.create_model(model_name, pretrained=pretrained)

Train LCL

The example training scripts are provided in ./scripts. You can refer to OpenCLIP for more ways to launch training.

Training on LAION-400M. Here is an example training script: ./scripts/lcl_vit_b_32_laion.sh. The corresponding model config is here.

Training on MMC4. We provide a simple dataloader that supports the original MMC4 dataset. Organize the data folder as follows:

  /path/to/mmc4/
      ├── images/
      │   └── ...
      └── data/ 
          ├── docs_shard_0_v2.jsonl.zip
          ├── docs_shard_1_v2.jsonl.zip
          └── ...

Here is an example training script: ./scripts/lcl_vit_b_32_mmc4.sh. The corresponding model config is here.

More training scripts can be found under ./scripts.

NOTE: We conduct large-scale pre-training with internal efficient code, which will not be released due to intellectual property reasons. This released version has been verified and can reproduce the results of ViT-B/16 on LAION-400M dataset.

📅 Schedule

basic code of LCL
checkpoints of more models and datasets
transfer evaluation code

🖊️ Citation

If you find this work helpful in your research, please consider citing:

@article{yang2024vision,
  title={Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning},
  author={Yang, Chenyu and Zhu, Xizhou and Zhu, Jinguo and Su, Weijie and Wang, Junjie and Dong, Xuan and Wang, Wenhai and Li, Bin and Zhou, Jie and Qiao, Yu and Dai, Jifeng},
  journal={arXiv preprint arXiv:2406.07543},
  year={2024}
}

📃 License

This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.

🙏 Acknowledgements

Our code is built with reference to the code of the following projects: OpenCLIP.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements-test.txt		requirements-test.txt
requirements-training.txt		requirements-training.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Compression Learning (LCL)

📈 Results

Pre-training on MMC4 Dataset

Comparison with OpenCLIP

📦 Pre-trained Checkpoints

🛠️ Usage

Install

Load Pre-trained Checkpoints

Train LCL

📅 Schedule

🖊️ Citation

📃 License

🙏 Acknowledgements

About

Releases

Packages

Languages

License

OpenGVLab/LCL

Folders and files

Latest commit

History

Repository files navigation

Latent Compression Learning (LCL)

📈 Results

Pre-training on MMC4 Dataset

Comparison with OpenCLIP

📦 Pre-trained Checkpoints

🛠️ Usage

Install

Load Pre-trained Checkpoints

Train LCL

📅 Schedule

🖊️ Citation

📃 License

🙏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages