Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Commit 5401b96

Browse files
committed
update code to include diffusion models, and many other smaller changes, including refactoring transforms
1 parent e224476 commit 5401b96

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+7295
-381
lines changed

LICENSE

+36-1
Original file line numberDiff line numberDiff line change
@@ -199,4 +199,39 @@
199199
distributed under the License is distributed on an "AS IS" BASIS,
200200
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201201
See the License for the specific language governing permissions and
202-
limitations under the License.
202+
limitations under the License.
203+
204+
--------------------------------------------------------------------------------
205+
The following license applies to metrics/coco_caption_eval.py:
206+
207+
Copyright (c) 2015, Xinlei Chen, Hao Fang, Tsung-Yi Lin, and Ramakrishna Vedantam
208+
All rights reserved.
209+
210+
Redistribution and use in source and binary forms, with or without
211+
modification, are permitted provided that the following conditions are met:
212+
213+
1. Redistributions of source code must retain the above copyright notice, this
214+
list of conditions and the following disclaimer.
215+
216+
2. Redistributions in binary form must reproduce the above copyright notice,
217+
this list of conditions and the following disclaimer in the documentation
218+
and/or other materials provided with the distribution.
219+
220+
3. Neither the name of the copyright holder nor the names of its contributors
221+
may be used to endorse or promote products derived from this software without
222+
specific prior written permission.
223+
224+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
225+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
226+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
227+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
228+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
229+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
230+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
231+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
232+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
233+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
234+
235+
The views and conclusions contained in the software and documentation are those
236+
of the authors and should not be interpreted as representing official policies,
237+
either expressed or implied, of the FreeBSD Project.

README.md

+52-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
# Pix2Seq - A general framework for turning RGB pixels into semantically meaningful sequences
1+
# Pix2Seq codebase: task-centric, multi-tasks with generative modeling
22

33
This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TPUs/GPUs support as well as interactive debugging similar to Pytorch.
4+
The original Pix2Seq code aims to be a general framework that turns RGB pixels into semantically meaningful sequences. We later extend it to be a generic codebase, with task-centric organization that supports different tasks as well as their combination, using generative modeling (both autoregressive and diffusion models, see below).
45

56
<div align="center">
67
<img width="95%" alt="Pix2Seq Illustration" src="pix2seq.gif">
@@ -9,6 +10,12 @@ This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TP
910
An illustration of Pix2Seq for object detection (from <a href="https://ai.googleblog.com/2022/04/pix2seq-new-language-interface-for.html">our Google AI blog post</a>).
1011
</div>
1112

13+
## (<span style="color:red">NEW!</span>) Diffusion models
14+
15+
We added (official) implementations of diffusion models (such as Bit Diffusion, RIN, see references below) built on top of the original Pix2Seq codebase and they can be found in tasks/ and models/.
16+
17+
Please note that we have not yet added proper documentations on training these models.
18+
1219
## Models
1320
<a href="https://colab.research.google.com/github/google-research/pix2seq/blob/master/colabs/pix2seq_inference_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
1421

@@ -178,5 +185,49 @@ Note: You can run eval on a subset of images by setting `--config.eval.steps`.
178185
}
179186
```
180187

188+
[Pix2seq-D paper](https://arxiv.org/abs/2210.06366):
189+
190+
```
191+
@article{chen2022unified,
192+
title={A generalist framework for panoptic segmentation of images and videos},
193+
author={Chen, Ting and Li, Lala and Saxena, Saurabh and Hinton, Geoffrey and Fleet, David J.},
194+
journal={arXiv preprint arXiv:2210.06366},
195+
year={2022}
196+
}
197+
```
198+
199+
[Bit Diffusion paper](https://arxiv.org/abs/2208.04202):
200+
201+
```
202+
@article{chen2022analog,
203+
title={Analog bits: Generating discrete data using diffusion models with self-conditioning},
204+
author={Chen, Ting and Zhang, Ruixiang and Hinton, Geoffrey},
205+
journal={arXiv preprint arXiv:2208.04202},
206+
year={2022}
207+
}
208+
```
209+
210+
[RIN Diffusion paper](https://arxiv.org/abs/2212.11972):
211+
212+
```
213+
@article{jabri2022scalable,
214+
title={Scalable Adaptive Computation for Iterative Generation},
215+
author={Jabri, Allan and Fleet, David J. and Chen, Ting},
216+
journal={arXiv preprint arXiv:2212.11972},
217+
year={2022}
218+
}
219+
```
220+
221+
[Diffusion noise scheduling paper](https://arxiv.org/abs/2301.10972):
222+
223+
```
224+
@article{chen2023on,
225+
title={On the Importance of Noise Scheduling for Diffusion Models},
226+
author={Chen, Ting},
227+
journal={arXiv preprint arXiv:2301.10972},
228+
year={2023}
229+
}
230+
```
231+
181232
## Disclaimer
182233
This is not an officially supported Google product.

0 commit comments

Comments
 (0)