Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Alahmadi, Mohammad

doi:10.5281/zenodo.10823097

Published March 15, 2024 | Version v1

Software documentation Open

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Alahmadi, Mohammad (Contact person)¹

1. University of Jeddah

Replication Package for the paper that has been submitted to the MDPI (Mathmatics).
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of Optical Character Recognition (OCR) engines and the potential of Large Language Models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini.
We investigate the efficacy of Image Super-Resolution (SR) techniques, namely Enhanced Deep Super-Resolution (EDSR) and Multi-scale Deep Super-Resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.

Files

all_default_and_SR.zip

Files (1.2 GB)

Name	Size	Download all
all_default_and_SR.zip md5:1e76ef10a7c8cf046777f0fc2b560398	280.8 MB	Preview Download
All_Scripts.zip md5:75ea2d55c9a8c5c50b30609d1e77f47e	926.5 MB	Preview Download
Results_Figures.zip md5:89a9c9b1d5f426fadf8c1aa4d85f431e	3.3 MB	Preview Download

Additional details

Accepted: 2024-03-16

	All versions	This version
Views	215	102
Downloads	218	141
Data volume	90.9 GB	59.7 GB

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Creators

Description

Files

all_default_and_SR.zip

Files (1.2 GB)

Additional details

Dates