Published February 3, 2026 | Version v1
Journal article Open

Synthetic-Data Generation for Enhancing Malware and Phishing Determining Performance

  • 1. STG First Grade College

Description

The ML applications like Malware and phishing detection require security datasets, which should be of good quantity, quality, and diversity, but in real-world applications, they may deficit future (zero-day) or avoid variants, are not balanced, and provide privacy issues. Synthetic-Data Generation (SDG) (including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), transformer or large language model (LLM) generation) can be used to expand training corpora as well as simulate obscure variants as well as allow privacy- preserving collaboration. The proposed research model encompasses the literary background, recent developments (2021-2025), an experimental design, guidelines, ethics, and threat assessment, as well as the expected outcomes. Recent studies, such as those by Mal Data Gen, malware benchmarks, phishing synthesis using LLM, and improvements based on GANs, are used to support the affirmation.

Files

synthetic-data-generation-for-enhancing-malware-and-phishing-determining-performance-IJERTV15IS010504.pdf

Additional details