AI-Assisted Manga Creation: A Workflow for Non-Artists
Description
Making manga has always required years of artistic training — a barrier that has kept countless storytellers from the medium. This paper asks whether generative AI can change that. I developed and tested a five-stage production pipeline that combines large language models for narrative writing with diffusion-based image synthesis for visuals, covering everything from initial story concept through to finished page layout. To validate the approach, I produced a complete five-page manga chapter from scratch — using ChatGPT (OpenAI, 2023), Stable Diffusion (Stability AI, 2022), Midjourney (Midjourney, 2023), and Clip Studio Paint — without any formal drawing training. The results are genuinely encouraging: production time fell dramatically compared to conventional methods, and three independent readers found the chapter coherent and visually engaging. That said, keeping characters visually consistent across panels remained a real struggle, and the emotional depth that comes from a skilled human artist's hand is not something current tools can fully replicate. Beyond the technical findings, this paper engages honestly with the harder questions — what AI-assisted creation means for professional artists, who owns the work, and what it means to call something genuinely creative.
Files
Mahadi_Islam_Alif_AI_Manga_Workflow_2026.pdf
Files
(36.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:8dc29536001514b06739d3096f98f267
|
36.9 kB | Preview Download |
Additional details
Dates
- Created
-
2026-03-07
References
- Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023) (pp. 3836-3847). IEEE.
- Midjourney, Inc. (2023). Midjourney v5: Image synthesis platform documentation. Retrieved March 7, 2026, from https://docs.midjourney.com/
- McCloud, S. (1993). Understanding comics: The invisible art. HarperCollins Publishers.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008.
- McCormack, J., Gifford, T., & Hutchings, P. (2019). Autonomy, authenticity, authorship and intention in computer generated art. In Proceedings of the 8th International Conference on Computational Intelligence in Music, Sound, Art and Design (EvoMUSART 2019) (Vol. 11453, pp. 35-50). Springer.
- Stability AI Ltd. (2022). Stable Diffusion: A latent text-to-image diffusion model [Computer software]. Retrieved March 7, 2026, from https://stability.ai/
- Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851.
- Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D. P., Poole, B., Norouzi, M., Fleet, D. J., & Salimans, T. (2022). Imagen video: High definition video generation with diffusion models. arXiv Preprint. arXiv:2210.02303
- Adobe Inc. (2023). Adobe Firefly: Generative AI for creative workflows [Computer software].
- Anderson v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. 2023).
- Animation Guild, IATSE Local 839. (2023). AI and the entertainment industry: A survey of members' experiences. IATSE.
- Anthropic, PBC. (2023). Claude: A large-scale language model [Computer software].
- Banet-Weiser, S. (2012). Authentic: The politics of ambivalence in a brand culture. NYU Press.
- Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015) (Vol. 37, pp. 2256-2265). PMLR.
- Samuelson, P. (2023). Generative AI meets copyright. Science, 381(6654), 158-161. https://doi.org/10.1126/science.adi0656
- European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 on Artificial Intelligence. Official Journal of the European Union.
- Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv Preprint. arXiv:2303.10130
- Epstein, Z., Levine, S., Rand, D. G., & Rahwan, I. (2020). Who gets credit for AI-generated art? iScience, 23(9), Article 101515. https://doi.org/10.1016/j.isci.2020.101515
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21) (pp. 610-623). Association for Computing Machinery.
- Runway AI Inc. (2023). Gen-2: Multimodal AI research and video synthesis [Computer software]. Retrieved March 7, 2026, from https://runwayml.com/
- Celsys Inc. (2023). AI functions in Clip Studio Paint [Computer software]. Retrieved March 7, 2026, from https://www.clip-studio.com/
- Berndt, J. (2008). Considering manga discourse: Position, literature, authenticity. In M. W. MacWilliams (Ed.), Japanese visual culture: Explorations in the world of manga and anime (pp. 295-310). M.E. Sharpe.
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
- Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W. W. Norton & Company.
- Elgammal, A., Liu, B., Elbadawy, M., & Mazzone, M. (2017). CAN: Creative adversarial networks: Generating art by learning about styles and deviating from style norms. In Proceedings of the 8th International Conference on Computational Creativity (pp. 96-103). Association for Computational Creativity.
- Getty Images (US), Inc. v. Stability AI Ltd., No. 1:23-cv-00135-UNA (D. Del. filed Feb. 3, 2023).
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672-2680.
- Grand View Research. (2022). Manga market size, share & trends analysis report, 2022-2030 (Report No. GVR-4-68039-225-7). Grand View Research, Inc.
- HAKUREI. (2022). Waifu Diffusion v1.3: Anime-style latent diffusion model [Computer software]. Hugging Face. Retrieved March 7, 2026, from https://huggingface.co/hakurei/waifu-diffusion
- NovelAI. (2022). NovelAI Diffusion: Anime image generation [Computer software]. Anlatan. Retrieved March 7, 2026, from https://novelai.net/
- OpenAI. (2023). GPT-4 technical report. arXiv Preprint. arXiv:2303.08774
- Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022) (pp. 10684-10695). IEEE.
- Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2023). DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) (pp. 22500-22510). IEEE.