GitHub - vamshi-chidara/image-caption-generator: Generate Natural language captions to images using encoder decoder architecture

Introduction

In recent years, image captioning has gathered widespread attention, which involves generating a concise description of an image.
This task involves Computer Vision (CV), Natural Language Generation (NLG), and Machine Learning (ML)/ Deep learning methods. Deep learning (DL) is rapidly advancing and one of the most researched fields of study that makes its way into many aspects of our daily lives. It is a sub field of Machine Learning concerned with algorithms and is inspired by the structure and the functioning of the brain.

Problem Statement

In this project, we developed a model that generates a concise natural language description of an image.
We used Microsoft Common Objects in Context (MS COCO) dataset for this project which has class labels, labels for different segments of an image, and a set of captions for a given image.

Dataset

MS COCO Dataset - https://cocodataset.org/home
Script to download dataset - Download Script

Requirements

Following setup is needed to train the model.

A GPU with atleast 16GB memory
Atleast 8GB of RAM
Download latest versions of the following packages

nltk, os, torch, vocabulary, PIL, Image, COCO, numpy, tqdm, json,data_set_loader, pipeline_models, sys, math, os, time,

Network Architecture

CNN Architecture consists of:

Conv2D layers
Max Pooling
Relu Activation
Fully Connected Layer

RNN Architecture consists of:

LSTM Layers
Memory Cell
Forget Gate
Input Gate
Input Modulation Gate
Output Gate

Procedure to Train Model

In data_set_loader.py and vocabulary.py set the path to dataset correctly.
Run model_training.py to train the model.
If training for the first time, in data_set_loader.py, set vocab_from_file=True in the method get_data_set_loader. From the next time you run, set it to false to use the vocab file already created previously.
Set the parameters epoch, batch_size, mode, path_to_files in dataset_loader.py
pipeline_models.py has the architecture of the models we are using.
After each epoch we are saving encoder(CNN) and decoder(LSTM) models as checkpoints in /models folder.

Procedure to Predict

Using the model_inference.ipynb notebook file, there are two ways in which we can see the captions generated by the model we have built.
We have uploaded our trained models encoderCNN.pkl and decoderRNN.pkl in this repository. These models will be used for inferencing.

Option 1 : Caption for a random image from the test dataset.

Use the get_caption() method. A random image is selected each time we run this method.

Option 2 : Caption for an image by passing its absolute path.

Use the get_image_caption(image_path) method. The parameter to this method is the path for the image whose caption we want to know.
You can use the sample image surf_image.jpg for instance to see the output. You can download any image from the internet and see its output by passing its path.

Frequently encountered problems

Out of memory issue:

Try reducing batch_size
Increasing gpu computation capability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Problem Statement

Dataset

Requirements

Network Architecture

Procedure to Train Model

Procedure to Predict

Option 1 : Caption for a random image from the test dataset.

Option 2 : Caption for an image by passing its absolute path.

Frequently encountered problems

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
README.md		README.md
bleu.ipynb		bleu.ipynb
data_set_loader.py		data_set_loader.py
dataset_download.sh		dataset_download.sh
decoderRNN.pkl		decoderRNN.pkl
encoderCNN.pkl		encoderCNN.pkl
model_inference.ipynb		model_inference.ipynb
model_training.py		model_training.py
pipeline_models.py		pipeline_models.py
surf_image.jpg		surf_image.jpg
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

vamshi-chidara/image-caption-generator

Folders and files

Latest commit

History

Repository files navigation

Introduction

Problem Statement

Dataset

Requirements

Network Architecture

Procedure to Train Model

Procedure to Predict

Option 1 : Caption for a random image from the test dataset.

Option 2 : Caption for an image by passing its absolute path.

Frequently encountered problems

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages