Published January 1, 2024 | Version v1
Conference paper Open

SmartCaption AI - Enhancing Web Accessibility with Context-Aware Image Descriptions Using Large Language Models

Authors/Creators

Description

The Internet provides vast amounts of information, services, and products. However, blind individuals and those with severe vision impairments face significant challenges in navi-gating web content, especially with understanding images. This paper introduces SmartCaption AI, an innovative solution that leverages Large Language Models (LLM) to generate descriptive text for images on web pages. By summarizing the content of a web page, SmartCaption AI provides relevant context for the LLM to produce accurate and meaningful image descriptions. These descriptions are seamlessly integrated into the web page's structure, allowing text- to-speech software to read them aloud to visually impaired users. SmartCaption AI offers several key contributions to web accessibility. It ensures the generated descriptions are contextually relevant, enhances the browsing experience by integrating real-time descriptions, and provides a universally accessible solution through a Chrome extension. This approach addresses the critical issue of missing or inadequate alternative text for images, thereby bridging the digital divide between sighted and visually impaired individuals. The results of our experiment demonstrated the effectiveness of SmartCaption AI, with an average score of 8.3/10, significantly outperforming state-of-art solutions: ImageToText (1.7/10) and AI-MCS (3.6/10). The source code of the tool is available on GitHub.

Files

huynh-2024-smartcaption-2.pdf

Files (2.3 MB)

Name Size Download all
md5:a874c06f5984062f4c5ecc2a7fdb95e7
2.3 MB Preview Download

Additional details