Published April 30, 2024 | Version CC BY-NC-ND 4.0
Journal article Open

An Overview of Text to Visual Generation Using GAN

  • 1. M. Tech Scholar, Department of CSE TKM College of Engineering Kollam, Kerala, India.


Abstract- Text-to-visual generation was once a cumbersome task until the advent of deep learning networks. With the introduction of deep learning, both images and videos can now be generated from textual descriptions. Deep learning networks have revolutionized various fields, including computer vision and natural language processing, with the emergence of Generative Adversarial Networks (GANs). GANs have played a significant role in advancing these domains.A GAN typically comprises multiple deep networks combined with other machine learning techniques. In the context of text-to-visual generation, GANs have enabled the synthesis of images and videos based on textual input. This work aims to explore different variations of GANs for image and video synthesis and propose a general architecture for textto-visual generation using GANs. Additionally, this study delves into the challenges associated with thistask and discusses ongoing research and future prospects.By leveraging the power of deep learning networks and GANs, the process of generating visual content from text has become more accessible and efficient. This work will contribute to the understanding and advancement of text-to-visual generation, paving the way for numerous applications across various industries.



Files (475.8 kB)

Name Size Download all
475.8 kB Preview Download

Additional details



Manuscript received on 30 March 2024 | Revised Manuscript received on 12 April 2024 | Manuscript Accepted on 15 April 2024 | Manuscript published on 30 April 2024


  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley.
  • S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  • Gregor, K., Danihelka, I., Graves, A., Rezende, D., & Wierstra, D. (2015, June). Draw: A recurrent neural network for image generation. In International conference on machine learning (pp. 1462-1471). PMLR.
  • Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469.
  • Agnese, J., Herrera, J., Tao, H., & Zhu, X. (2020). A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(4), e1345.
  • Hayashi, M., Inoue, S., Douke, M., Hamaguchi, N., Kaneko, H., Bachelder, S., & Nakajima, M. (2014). T2v: New technology of converting text to cg animation. ITE Transactions on Media Technology and Applications, 2(1), 74-81.
  • Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,
  • N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907-5915)
  • Kambhampati. Monica, Duvvada Rajeswara Rao," Text to Image Translation using Cycle GAN", International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume9 Issue-4, April 2020
  • Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1526-1535).
  • Saito, M., Matsumoto, E., & Saito, S. (2017). Temporal generative adversarial nets with singular value clipping. In Proceedings ofthe IEEE international conference on computer vision (pp. 2830-2839).
  • Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. Advances in neural information processing systems, 29.
  • Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019, August). Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis. In IJCAI (Vol. 1, No. 2019, p. 2).
  • Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas
  • N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8), 1947-1962.
  • Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1316-1324). 1316–1324.
  • Ak, K. E., Lim, J. H., Tham, J. Y., & Kassim, A. A. (2020). Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network. Pattern Recognition Letters, 135, 22-29.
  • Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). Mirrorgan: Learning text- to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1505- 1514).
  • Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee,
  • H. (2016, June). Generative adversarial text to image synthesis. In International conference on machine learning (pp. 1060-1069). PMLR.
  • Zhu, M., Pan, P., Chen, W., & Yang, Y. (2019). Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5802-5810).
  • Kim, D., Joo, D., & Kim, J. (2020). Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access, 8, 153113-153122.
  • Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., & Raff, E. (2022, October). Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision (pp. 88-105). Cham: Springer Nature Switzerland. 19836-6_6
  • Li, B., Qi, X., Lukasiewicz, T., & Torr, P. H. (2020). Manigan: Textguided image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7880-7889).
  • Karthika*, N., Janet, B., & Shukla, H. (2019). A Novel Deep Neural Network Model for Image Classification. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6, pp. 3241– 3249).
  • Sumanth, A. G., R. Hema, Sumanth, R. H., Chowdary, A. C. V., Shashank, A., & Sravan, T. (2020). Real Time Image Captaioning. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 6, pp. 1707–1709).
  • Bai, D. M. R., Sreedevi, Mrs. J., & Pragna, Ms. B. (2020). Enhanced Unsupervised Image Generation using GAN based Convolutional Nets. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 8, Issue 6, pp. 5312–5316).
  • Radhamani, V., & Dalin, G. (2019). Significance of Artificial Intelligence and Machine Learning Techniques in Smart Cloud Computing: A Review. In International Journal of Soft Computing and Engineering (Vol. 9, Issue 3, pp. 1–7).
  • Nair, V. K., Jose, R. R., Anil, P. B., Tom, M., & P.L., L. (2020). Automation of Cricket Scoreboard by Recognizing Umpire Gestures. In International Journal of Innovative Science and Modern Engineering (Vol. 6, Issue 7, pp. 1–7).