-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noisy output & "text_use_bert_cls" error #13
Comments
What dataset are you training it on? |
I'm getting noisy input too running provided example (see below). Or is there are some pre-training need to be done?
|
@DaddyWesker Obviously, you literally just provide with |
Hm, I'm just launching provided code. What kind of video should i provide then? I can't see any info in the README about that. |
You can try using the moving MNIST, what I also tried was the MSR-VTT dataset to test the training with annotations as well. |
Should this video have some correlation with text? FOr example, if you are saying that moving MNIST could be used, text should look like "moving digit five" or something like that? |
Yes, but I have not found any annotations out there for the moving MNIST one, hence I only trained it without. |
If you have found anything in this area, let me know. |
Okay. I will. |
@oxjohanndiep |
I trained it for maybe 100 epochs which took me good 10h with CUDA enabled. No I did not got any good results, but maybe we can have a video chat to discuss this if you want. |
Hm. I haven't seen some of those parameters in training code in README and in Trainer class. I guess you wrote your own trainer? |
Yes I did, do you get other results with the Trainer class? |
I'm currently trying to train this model using trainer. When i'll get some results - i'll let you know |
Awesome |
How long did you train it for in terms of time? |
That looks amazing! |
Several days on 1080ti gpu. From monday till today. |
Thats very interesting, I have never trained it for so long, max only around 6 hours! Will give it a go! |
Btw it does look like per video, you have more than 5 frames. Did you increase the number of frames accepted by the model as well? |
20 frames as i remember. As in moving mnist samples. Though i can use batch_size = 1 only =) Here is the parameters I've changed
And batch_size in trainer of course. |
Alright, let me increase the frame number as well and give it a go. Report you the results in a couple of days! |
@DaddyWesker How did you plot those little GIFs of the results actually? |
@DaddyWesker And have you tried testing it on a more sophisticated dataset, i.e. Kinetic-600 with their text annotation? Would be very interesting to see how the results are conditioned on text. |
No i haven't tested on different dataset. I'll see if i will have enough time for this. About gifs. In this repo in
I'm using this one. It saves gif and returns it as ndarray output (if you somehow need it) |
@DaddyWesker Have to admit, your results looks far better than mine: This took me 3 days to train, and I only got 1000 epochs. How were you able to run 70k epochs? And what kind of learning-rate did you choose? |
Well, i don't know what to say about "how was i able to train 70k epochs". I've just ran training code from README on mnist. Nothing special. |
|
May I ask if you use normalization for your dataset |
The "name text_use_bert_cls is not defined" error occurs when trying to use explicit texts as mentioned in the 3rd example. The error occurs as the variable is not directly linked to the class in the function "p_losses".
On fixing that, when I ran the code, the output samples generated are random noise. I ran the inference for 1K and 50K steps respectively. Can you please guide if I am missing any step.
Attaching the output generated.
.
The text was updated successfully, but these errors were encountered: