This is a python project for video captioning, using hLSTMat model on the msvd or msr-vtt dataset.
- download msvd dataset
- download msr-vtt dataset
- extract video feature using https://github.com/Cppowboy/video_feature_extractor
- python 2.7
- tensorflow
- tensorboard
- numpy
- pandas
- pickle
- First, you need to change the data paths in data_engine.py to your own paths.
- Use
python train.py
to run the train script. usetensorboard --logdir your_log_dir
to visualize the train procedure and show the scores.
- https://github.com/zhaoluffy/hLSTMat
- https://github.com/yunjey/show-attend-and-tell
- Song, Jingkuan, et al. "Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning." arXiv preprint arXiv:1706.01231 (2017).