Noisy output & "text_use_bert_cls" error #13

GoutamKelam · 2022-07-13T06:48:30Z

The "name text_use_bert_cls is not defined" error occurs when trying to use explicit texts as mentioned in the 3rd example. The error occurs as the variable is not directly linked to the class in the function "p_losses".
On fixing that, when I ran the code, the output samples generated are random noise. I ran the inference for 1K and 50K steps respectively. Can you please guide if I am missing any step.

Attaching the output generated.

.

oxjohanndiep · 2022-07-19T14:53:44Z

What dataset are you training it on?

DaddyWesker · 2022-07-27T08:41:20Z

I'm getting noisy input too running provided example (see below). Or is there are some pre-training need to be done?

import torch
from video_diffusion_pytorch import Unet3D, GaussianDiffusion

model = Unet3D(
    dim = 64,
    use_bert_text_cond = True,  # this must be set to True to auto-use the bert model dimensions
    dim_mults = (1, 2, 4, 8),
)

diffusion = GaussianDiffusion(
    model,
    image_size = 32,    # height and width of frames
    num_frames = 5,     # number of video frames
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
)

videos = torch.randn(3, 3, 5, 32, 32) # video (batch, channels, frames, height, width)

text = [
    'a whale breaching from afar',
    'young girl blowing out candles on her birthday cake',
    'fireworks with blue and green sparkles'
]

loss = diffusion(videos, cond = text)
loss.backward()
# after a lot of training

sampled_videos = diffusion.sample(cond = text, cond_scale = 2)
sampled_videos.shape # (3, 3, 5, 32, 32)

oxjohanndiep · 2022-07-27T08:44:07Z

@DaddyWesker Obviously, you literally just provide with videos = torch.randn(3, 3, 5, 32, 32) noisy video data as an input to training.

DaddyWesker · 2022-07-27T08:46:46Z

@oxjohanndiep

Hm, I'm just launching provided code. What kind of video should i provide then? I can't see any info in the README about that.

oxjohanndiep · 2022-07-27T08:48:59Z

You can try using the moving MNIST, what I also tried was the MSR-VTT dataset to test the training with annotations as well.

DaddyWesker · 2022-07-27T08:52:51Z

Should this video have some correlation with text? FOr example, if you are saying that moving MNIST could be used, text should look like "moving digit five" or something like that?

oxjohanndiep · 2022-07-27T08:56:20Z

Yes, but I have not found any annotations out there for the moving MNIST one, hence I only trained it without.

oxjohanndiep · 2022-07-27T09:04:19Z

If you have found anything in this area, let me know.

DaddyWesker · 2022-07-27T09:05:09Z

Okay. I will.

DaddyWesker · 2022-07-29T06:15:02Z

@oxjohanndiep
How long have you trained on moving mnist this diffusion model and have you got any reasonable results?

oxjohanndiep · 2022-07-29T06:17:16Z

I trained it for maybe 100 epochs which took me good 10h with CUDA enabled. No I did not got any good results, but maybe we can have a video chat to discuss this if you want.

oxjohanndiep · 2022-07-29T06:20:43Z

@DaddyWesker

DaddyWesker · 2022-07-29T06:35:03Z

Hm. I haven't seen some of those parameters in training code in README and in Trainer class. I guess you wrote your own trainer?

oxjohanndiep · 2022-07-29T06:36:40Z

Yes I did, do you get other results with the Trainer class?

DaddyWesker · 2022-07-29T06:37:54Z

I'm currently trying to train this model using trainer. When i'll get some results - i'll let you know

oxjohanndiep · 2022-07-29T06:38:29Z

Awesome

DaddyWesker · 2022-08-04T06:53:15Z

Currently, model is being trained. Here are some results. First one on 36000 epoch, second one is on 70000 epoch. Not sure if those results are good or not.

oxjohanndiep · 2022-08-04T08:30:46Z

How long did you train it for in terms of time?

oxjohanndiep · 2022-08-04T08:30:55Z

That looks amazing!

DaddyWesker · 2022-08-04T08:35:26Z

Several days on 1080ti gpu. From monday till today.

oxjohanndiep · 2022-08-04T08:41:29Z

Thats very interesting, I have never trained it for so long, max only around 6 hours! Will give it a go!

oxjohanndiep · 2022-08-04T08:49:42Z

Btw it does look like per video, you have more than 5 frames. Did you increase the number of frames accepted by the model as well?

DaddyWesker · 2022-08-04T08:58:17Z

20 frames as i remember. As in moving mnist samples. Though i can use batch_size = 1 only =)

Here is the parameters I've changed

diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    num_frames = 20,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
).cuda()

And batch_size in trainer of course.

oxjohanndiep · 2022-08-04T09:00:16Z

Alright, let me increase the frame number as well and give it a go. Report you the results in a couple of days!

oxjohanndiep · 2022-08-04T17:34:28Z

@DaddyWesker How did you plot those little GIFs of the results actually?

oxjohanndiep · 2022-08-04T17:36:38Z

@DaddyWesker And have you tried testing it on a more sophisticated dataset, i.e. Kinetic-600 with their text annotation? Would be very interesting to see how the results are conditioned on text.

DaddyWesker · 2022-08-05T06:10:03Z

No i haven't tested on different dataset. I'll see if i will have enough time for this.

About gifs. In this repo in video_diffusion_pytorch/video_diffusion_pytorch.py lies function


def video_tensor_to_gif(tensor, path, duration = 120, loop = 0, optimize = True):
    images = map(T.ToPILImage(), tensor.unbind(dim = 1))
    first_img, *rest_imgs = images
    first_img.save(path, save_all = True, append_images = rest_imgs, duration = duration, loop = loop, optimize = optimize)
    return images

I'm using this one. It saves gif and returns it as ndarray output (if you somehow need it)

oxjohanndiep · 2022-08-09T03:11:37Z

@DaddyWesker Have to admit, your results looks far better than mine:

This took me 3 days to train, and I only got 1000 epochs. How were you able to run 70k epochs? And what kind of learning-rate did you choose?

DaddyWesker · 2022-08-09T04:19:31Z

train_lr = 1e-4

Well, i don't know what to say about "how was i able to train 70k epochs". I've just ran training code from README on mnist. Nothing special.

DaddyWesker · 2022-08-09T04:20:51Z

import torch
from video_diffusion_pytorch import Unet3D, GaussianDiffusion, Trainer

def video_tensor_to_gif(tensor, path, duration = 120, loop = 0, optimize = True):
    images = map(T.ToPILImage(), tensor.unbind(dim = 1))
    first_img, *rest_imgs = images
    first_img.save(path, save_all = True, append_images = rest_imgs, duration = duration, loop = loop, optimize = optimize)
    return images

model = Unet3D(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
)

diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    num_frames = 20,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
).cuda()

trainer = Trainer(
    diffusion,
    './data',                         # this folder path needs to contain all your training data, as .gif files, of correct image size and number of frames
    train_batch_size = 1,
    train_lr = 1e-4,
    save_and_sample_every = 1000,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True                        # turn on mixed precision
)



trainer.train()

sampled_videos = diffusion.sample(batch_size = 4)
u_sampled_videos = sampled_videos.unbind(dim = 1)
for i in range(len(u_sampled_videos)):
    images = video_tensor_to_gif(u_sampled_videos[i], "result_"+str(i)+".gif")

clearlyzero · 2023-05-22T12:11:16Z

May I ask if you use normalization for your dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noisy output & "text_use_bert_cls" error #13

Noisy output & "text_use_bert_cls" error #13

GoutamKelam commented Jul 13, 2022

oxjohanndiep commented Jul 19, 2022

DaddyWesker commented Jul 27, 2022 •

edited

Loading

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

DaddyWesker commented Jul 29, 2022

oxjohanndiep commented Jul 29, 2022

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Jul 29, 2022 •

edited

Loading

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Jul 29, 2022 •

edited

Loading

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022 •

edited

Loading

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 5, 2022 •

edited

Loading

oxjohanndiep commented Aug 9, 2022

DaddyWesker commented Aug 9, 2022

DaddyWesker commented Aug 9, 2022

clearlyzero commented May 22, 2023 •

edited

Loading

Noisy output & "text_use_bert_cls" error #13

Noisy output & "text_use_bert_cls" error #13

Comments

GoutamKelam commented Jul 13, 2022

oxjohanndiep commented Jul 19, 2022

DaddyWesker commented Jul 27, 2022 • edited Loading

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

oxjohanndiep commented Jul 27, 2022

DaddyWesker commented Jul 27, 2022

DaddyWesker commented Jul 29, 2022

oxjohanndiep commented Jul 29, 2022

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Jul 29, 2022 • edited Loading

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Jul 29, 2022 • edited Loading

oxjohanndiep commented Jul 29, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022 • edited Loading

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

oxjohanndiep commented Aug 4, 2022

DaddyWesker commented Aug 5, 2022 • edited Loading

oxjohanndiep commented Aug 9, 2022

DaddyWesker commented Aug 9, 2022

DaddyWesker commented Aug 9, 2022

clearlyzero commented May 22, 2023 • edited Loading

DaddyWesker commented Jul 27, 2022 •

edited

Loading

DaddyWesker commented Jul 29, 2022 •

edited

Loading

DaddyWesker commented Jul 29, 2022 •

edited

Loading

oxjohanndiep commented Aug 4, 2022 •

edited

Loading

DaddyWesker commented Aug 5, 2022 •

edited

Loading

clearlyzero commented May 22, 2023 •

edited

Loading