Skip to content

manas1245agrawal/video_question_answering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

INTRODUCTION:

This repo contains a VQA model for GIFs which is trained on TumblrGIFs and works by combining the vector embeddings of the frame-based GIFs and text-based questions in the latent space using cross-attention and concatenated embeddings to gather context from the image and enable open-vocabulary answering capabilities. The approach was derived from Language Grounded QFormer for Efficient Vision Language Understanding.

Please refer to the following doc, explaining the specifics of our approach : Doc

Checkpoints :

Checkpoint for our simplified approach - .pth file

Cleaned Subset of the data used for training our models :

You can find the files here - Drive

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •