Published March 31, 2026 | Version 1.0
Preprint Open

Automatic Quiz Generation and Evaluation System using Large Language Models with Distractor Optimization

Description

This paper presents an intelligent system for the automatic generation and evaluation of educational quizzes using Large Language Models (LLMs) with a novel distractor optimization module. The system employs a multi-layered architecture covering resource processing, topic extraction, question generation, and quality evaluation.

A key contribution is the multi-stage distractor optimization pipeline, which uses semantic similarity techniques (TF-IDF, Sentence-BERT, cosine similarity) to ensure distractors are plausible, diverse, and non-redundant. Questions are generated across cognitive levels using Bloom's Taxonomy classification.

Experimental results demonstrate a diversity score of 0.97 with zero duplicate rate. Correlation analysis reveals that diversity strongly predicts overall question quality (r = 0.96), and a plausibility-relevance trade-off (r = −0.73) is identified as a key direction for future improvement.

The system is built using Flask, Celery, Redis, SQLAlchemy, and integrates LLM APIs with transformer-based semantic models for end-to-end quiz generation.

Files

Automatic _Quiz_Generation_and_Evaluation_System_using_Large_Language_Models_with_Distractor_Optimization.pdf

Additional details

Dates

Accepted
2026-03-31

Software