DALLE trained on FashionGen Dataset RESULTS 💯 #443

alexriedel1 · 2022-09-01T14:29:29Z

DALLE on FashionGen

I trained Dall-E + VQGAN on the FashionGen dataset (https://arxiv.org/abs/1806.08317) on Google Colab and got decent results.
Without the VQGAN training on the FashionGen dataset, DALLE is really bad at generating faces which makes clothing generations looking extremely strange.

Text to image generation and re-ranking by CLIP

Best 16 of 48 generations ranked by CLIP

Generations from the training set (Including their Groundtruths)

Generations based on custom prompts (withouttheir Groundtruths)

Model specifications

VAE
Trained VQGAN for 1 epoch on Fashion-Gen dataset
Embeddings: 1024
Batch size: 5

DALLE
Trained DALLE for 1 epoch on Fashion-Gen dataset
dim = 312
text_seq_len = 80
depth = 36
heads = 12
dim_head = 64
reversible = 0
attn_types =('full', 'axial_row', 'axial_col', 'conv_like')

Optimization
Optimizer: Adam
Learning rate: 4.5e-4
Gradient Clipping: 0.5
Batch size: 7

Asthestarsfalll · 2022-09-26T11:04:58Z

Hi, can you offer the Colab link and check points?

alexriedel1 · 2022-09-26T11:28:09Z

Hi, can you offer the Colab link and check points?

You'll find the trained Dall-E weights here: https://drive.google.com/uc?id=1kEHTTZH2YbbHZjY6fTWuPb5_D-7nQ866

Asthestarsfalll · 2022-10-30T10:15:42Z

@alexriedel1
Thank you!
And I'm wondering which vocab you use, I only have the bpe_simple_vocab_16e6 supplied by openai

Asthestarsfalll · 2022-10-30T15:38:09Z

I download the weights, but it seems that it's parameters are different.

alexriedel1 · 2022-11-01T08:14:14Z

Yes right, the text sequence length is 120, is this a problem for you?

Asthestarsfalll · 2022-11-01T10:17:29Z

No, It' s just different from the description of the model.

I'm wondering which bpe file you use, and why the num_text_tokens are such long.

alexriedel1 · 2022-11-01T11:43:25Z

I also used the default tokenizer in this project which uses bpe_simple_vocab_16e6 byte pair encoder https://github.com/lucidrains/DALLE-pytorch/blob/main/dalle_pytorch/tokenizer.py. It uses a text token size of 49408 by default.

I increased the text sequence length to 120 because the FashionGen dataset uses quite long text descriptions to the images.

Asthestarsfalll · 2022-11-01T11:45:55Z

Thank you a lot!

killah-t-cell · 2024-08-03T04:21:18Z

Hi do you still have access to the Fashiongen dataset? I can't seem to find a good link for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DALLE trained on FashionGen Dataset RESULTS 💯 #443

DALLE trained on FashionGen Dataset RESULTS 💯 #443

alexriedel1 commented Sep 1, 2022 •

edited

Loading

Asthestarsfalll commented Sep 26, 2022

alexriedel1 commented Sep 26, 2022

Asthestarsfalll commented Oct 30, 2022

Asthestarsfalll commented Oct 30, 2022

alexriedel1 commented Nov 1, 2022

Asthestarsfalll commented Nov 1, 2022

alexriedel1 commented Nov 1, 2022

Asthestarsfalll commented Nov 1, 2022

killah-t-cell commented Aug 3, 2024

DALLE trained on FashionGen Dataset RESULTS 💯 #443

DALLE trained on FashionGen Dataset RESULTS 💯 #443

Comments

alexriedel1 commented Sep 1, 2022 • edited Loading

DALLE on FashionGen

Text to image generation and re-ranking by CLIP

Generations from the training set (Including their Groundtruths)

Generations based on custom prompts (withouttheir Groundtruths)

Model specifications

Asthestarsfalll commented Sep 26, 2022

alexriedel1 commented Sep 26, 2022

Asthestarsfalll commented Oct 30, 2022

Asthestarsfalll commented Oct 30, 2022

alexriedel1 commented Nov 1, 2022

Asthestarsfalll commented Nov 1, 2022

alexriedel1 commented Nov 1, 2022

Asthestarsfalll commented Nov 1, 2022

killah-t-cell commented Aug 3, 2024

alexriedel1 commented Sep 1, 2022 •

edited

Loading