Dense Caption Imagining

Vatsal Verma; Darpan Khanna; Gaurvi Vishnoi; Shreyas Raturi

doi:10.5281/zenodo.8394992

Published September 30, 2023 | Version v1

Journal article Open

Dense Caption Imagining

A lot of recent research has focused on both
computer vision and natural language processing. Our
research focuses on the intersection of these, specifically
generating pictures from captions. We focus on the lower
data regime, using the COCO and CUB data sets which
include 200k and 11k picture and caption pairs
(respectively). We will use a hierarchical GAN
architecture as our baseline[7][24][26]. To improve our
baseline we attempt various methods targeting the

upsampling blocks, and adding residual or attention-
based layers. We will compare the inception score of the

methods to analyze our results. We will also consider
qualitative results to assure there is minimal mode
collapse and memorization. We find that of all our
improvements, improving the up-sampling technique to
use a Laplacian pyramid method with transposed
convolutional layers obtains the best results with a
minimal increase in computation time and memory needs.

Files

IJISRT22MAY1170.pdf

Files (795.7 kB)

Name	Size	Download all
IJISRT22MAY1170.pdf md5:2bc7a15f3ed77a5f602eb968ea8d3a5c	795.7 kB	Preview Download

241

Views

185

Downloads

Show more details

	All versions	This version
Views	241	237
Downloads	185	182
Data volume	182.2 MB	179.8 MB

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: September 30, 2023
Modified: July 11, 2024

Dense Caption Imagining

Creators

Description

Files

IJISRT22MAY1170.pdf

Files (795.7 kB)