An Overview of Text to Visual Generation Using GAN

doi:10.54105/ijipr.A8041.04030424

Published April 30, 2024 | Version CC BY-NC-ND 4.0

Journal article Open

An Overview of Text to Visual Generation Using GAN

Sibi Mathew (Contact person)¹

1. M. Tech Scholar, Department of CSE TKM College of Engineering Kollam, Kerala, India.

Abstract- Text-to-visual generation was once a cumbersome task until the advent of deep learning networks. With the introduction of deep learning, both images and videos can now be generated from textual descriptions. Deep learning networks have revolutionized various fields, including computer vision and natural language processing, with the emergence of Generative Adversarial Networks (GANs). GANs have played a significant role in advancing these domains.A GAN typically comprises multiple deep networks combined with other machine learning techniques. In the context of text-to-visual generation, GANs have enabled the synthesis of images and videos based on textual input. This work aims to explore different variations of GANs for image and video synthesis and propose a general architecture for textto-visual generation using GANs. Additionally, this study delves into the challenges associated with thistask and discusses ongoing research and future prospects.By leveraging the power of deep learning networks and GANs, the process of generating visual content from text has become more accessible and efficient. This work will contribute to the understanding and advancement of text-to-visual generation, paving the way for numerous applications across various industries.

Files

A804113010524.pdf

Files (475.8 kB)

Name	Size	Download all
A804113010524.pdf md5:6dfcdb3d17d53099fb1f3f8eda1b6ad2	475.8 kB	Preview Download

Additional details

DOI: 10.54105/ijipr.A8041.04030424
EISSN: 2582-8037

Accepted: 2024-04-15

Manuscript received on 30 March 2024 | Revised Manuscript received on 12 April 2024 | Manuscript Accepted on 15 April 2024 | Manuscript published on 30 April 2024

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley.
S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Gregor, K., Danihelka, I., Graves, A., Rezende, D., & Wierstra, D. (2015, June). Draw: A recurrent neural network for image generation. In International conference on machine learning (pp. 1462-1471). PMLR.
Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469.
Agnese, J., Herrera, J., Tao, H., & Zhu, X. (2020). A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(4), e1345. https://doi.org/10.1002/widm.1345
Hayashi, M., Inoue, S., Douke, M., Hamaguchi, N., Kaneko, H., Bachelder, S., & Nakajima, M. (2014). T2v: New technology of converting text to cg animation. ITE Transactions on Media Technology and Applications, 2(1), 74-81. https://doi.org/10.3169/mta.2.74
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,
N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907-5915)
Kambhampati. Monica, Duvvada Rajeswara Rao," Text to Image Translation using Cycle GAN", International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume9 Issue-4, April 2020 https://doi.org/10.35940/ijeat.D8703.049420
Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1526-1535). https://doi.org/10.1109/CVPR.2018.00165
Saito, M., Matsumoto, E., & Saito, S. (2017). Temporal generative adversarial nets with singular value clipping. In Proceedings ofthe IEEE international conference on computer vision (pp. 2830-2839). https://doi.org/10.1109/ICCV.2017.308
Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. Advances in neural information processing systems, 29.
Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019, August). Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis. In IJCAI (Vol. 1, No. 2019, p. 2). https://doi.org/10.24963/ijcai.2019/276
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas
N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8), 1947-1962. https://doi.org/10.1109/TPAMI.2018.2856256
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1316-1324). 1316–1324. https://doi.org/10.1109/CVPR.2018.00143
Ak, K. E., Lim, J. H., Tham, J. Y., & Kassim, A. A. (2020). Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network. Pattern Recognition Letters, 135, 22-29. https://doi.org/10.1016/j.patrec.2020.02.030
Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). Mirrorgan: Learning text- to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1505- 1514). https://doi.org/10.1109/CVPR.2019.00160
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee,
H. (2016, June). Generative adversarial text to image synthesis. In International conference on machine learning (pp. 1060-1069). PMLR.
Zhu, M., Pan, P., Chen, W., & Yang, Y. (2019). Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5802-5810). https://doi.org/10.1109/CVPR.2019.00595
Kim, D., Joo, D., & Kim, J. (2020). Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access, 8, 153113-153122. https://doi.org/10.1109/ACCESS.2020.3017881
Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., & Raff, E. (2022, October). Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision (pp. 88-105). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031- 19836-6_6
Li, B., Qi, X., Lukasiewicz, T., & Torr, P. H. (2020). Manigan: Textguided image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7880-7889). https://doi.org/10.1109/CVPR42600.2020.00790
Karthika*, N., Janet, B., & Shukla, H. (2019). A Novel Deep Neural Network Model for Image Classification. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6, pp. 3241– 3249). https://doi.org/10.35940/ijeat.f8832.088619
Sumanth, A. G., R. Hema, Sumanth, R. H., Chowdary, A. C. V., Shashank, A., & Sravan, T. (2020). Real Time Image Captaioning. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 6, pp. 1707–1709). https://doi.org/10.35940/ijitee.f4566.049620
Bai, D. M. R., Sreedevi, Mrs. J., & Pragna, Ms. B. (2020). Enhanced Unsupervised Image Generation using GAN based Convolutional Nets. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 8, Issue 6, pp. 5312–5316). https://doi.org/10.35940/ijrte.f9856.038620
Radhamani, V., & Dalin, G. (2019). Significance of Artificial Intelligence and Machine Learning Techniques in Smart Cloud Computing: A Review. In International Journal of Soft Computing and Engineering (Vol. 9, Issue 3, pp. 1–7). https://doi.org/10.35940/ijsce.c3265.099319
Nair, V. K., Jose, R. R., Anil, P. B., Tom, M., & P.L., L. (2020). Automation of Cricket Scoreboard by Recognizing Umpire Gestures. In International Journal of Innovative Science and Modern Engineering (Vol. 6, Issue 7, pp. 1–7). https://doi.org/10.35940/ijisme.g1235.056720

	All versions	This version
Views	19	19
Downloads	8	8
Data volume	4.8 MB	4.8 MB

An Overview of Text to Visual Generation Using GAN

Files

A804113010524.pdf

Files (475.8 kB)

Additional details

Identifiers

Dates

References

An Overview of Text to Visual Generation Using GAN

Creators

Description

Files

A804113010524.pdf

Files (475.8 kB)

Additional details

Identifiers

Dates

References