Automatic Image captioning with Auto-Encoders

Building the model

Coming to the main model, image captioning architecture consists of three models:

A CNN: used to extract the image features A TransformerEncoder: The extracted image features are then passed to a Transformer based encoder that generates a new representation of the inputs A TransformerDecoder: This model takes the encoder output and the text data (sequences) as inputs and tries to learn to generate the caption.

Short summary of model

CNN extract features >> Tranformer encoder (new representation of CNN output) >> TransformerDecoder takes (transformer encoder outputs + text data (in integer sequence format) and learns to generate captions corresponding to imgs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic Image captioning with Auto-Encoders

Building the model

Short summary of model

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Image captioning with Auto-Encoders

Building the model

Short summary of model