Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube video walk-through of this codebase #88

Open
gordicaleksa opened this issue Jul 31, 2022 · 2 comments
Open

YouTube video walk-through of this codebase #88

gordicaleksa opened this issue Jul 31, 2022 · 2 comments

Comments

@gordicaleksa
Copy link

Hi @kuprel!

First of all awesome work, you made my job that much easier. :)

I created a YouTube video where I do a deep dive/walk-through of this repo.

Maybe someone finds it useful:
https://youtu.be/x_8uHX5KngE

Hopefully it's ok to share it here in the form of an issue, do let me know!

@kuprel
Copy link
Owner

kuprel commented Jul 31, 2022

Wow this is great! I just added your video to the readme. You're right the clamping is unnecessary. It originally served to avoid a cryptic cuda runtime error. Later I implemented a more precise solution to limit the BART decoder to 2**14 tokens to match the VQGAN. I'm not sure why there's a mismatch in vocabulary counts. Also I didn't realize those are shared weights. There's probably a simpler solution here. Great video!

@kuprel
Copy link
Owner

kuprel commented Jul 31, 2022

I checked to see if the embedding weights in the BART decoder were the same weights as the embedding weights in the VQGAN detokenizer. It seems they are actually different. The BART decoder in Dalle Mega is embedding to 2048 dimensions and the VQGAN is embedding to 256 dimensions.

Screenshot 2022-07-31 at 12 25 13 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants