robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

Mustafa

doi:10.5281/zenodo.18422535

Published January 29, 2026 | Version TextSimilarityGrader

Software Open

robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

Mustafa

Abstract

In large-scale psychological and linguistic studies, manual coding of speech-to-text transcripts is time-consuming and prone to human error. Furthermore, automated transcription services (ASR) often introduce phonetic errors, typos, or misinterpretations (e.g., "Appple" instead of "Apple"), rendering exact-string-matching algorithms ineffective for automated grading.

TextSimilarityGrader is an open-source Python utility designed to solve this problem. It automates the evaluation of transcript files (JSON or TXT) against a set of expected keywords/answers. By utilizing fuzzy string matching (based on Gestalt Pattern Matching), the tool identifies correct answers even when the transcript contains spelling errors, dialect variations, or ASR artifacts. This allows for rapid, standardized scoring (0/1) of thousands of audio transcripts with high reliability.

Motivation and Problem Statement

Researchers utilizing ASR (Automatic Speech Recognition) tools like Gladia, OpenAI Whisper, or Google STT often face a "post-processing bottleneck." While the audio is transcribed quickly, verifying if a participant said a specific target word requires reading through thousands of files. Simple "Ctrl+F" search scripts fail when the ASR makes minor mistakes (e.g., transcribing "Buß" instead of "Bus").

Methodology

The software implements a multi-stage evaluation pipeline:

Data Ingestion: The tool parses various transcript formats, including nested JSON structures (common in API outputs) and plain text.
Normalization: Input text is cleaned (lowercased, punctuation removed, special character normalization) to ensure comparability.
Fuzzy Logic Matching & Mathematical Foundation: The core engine utilizes the difflib.SequenceMatcher class, which implements the Ratcliff/Obershelp pattern recognition algorithm. The similarity ratio S is calculated as: S = (2 * M) / T Where:

M is the number of matching characters.
T is the total number of characters in both sequences (T = len(a) + len(b)).

This yields a normalized score S between 0.0 and 1.0, where 1.0 indicates an identical match. 4. Threshold-Based Grading: A similarity threshold (default ≥ 0.75) determines validity. The score assignment follows a binary classification logic:

Score = 1 (Correct) if S ≥ 0.75
Score = 0 (Incorrect) if S < 0.75

Note: A dynamic constraint is applied to short words (≤ 3 characters) to minimize false positives. 5. Reporting: Results are exported to an Excel file, listing the detected word, the full context sentence, the calculated similarity score, and the final point allocation.

Key Features

ASR-Agnostic: Works with Gladia JSON, generic JSON, and .txt files.
Error Tolerance: Robust against ASR hallucinations, stuttering, and phonetic misspellings.
Batch Processing: Capable of processing thousands of files in a single run.
Visual Validation: The output Excel sheet allows researchers to manually verify "close calls" by reviewing the similarity percentage and extracted context.
Reproducibility: Includes a test suite (tests/) to generate mock data with intentional typos, validating the grading logic before real data processing.

Workflow

The tool operates in three steps:

Template Generation: Scans the data folder and creates an Excel template (Solutions.xlsx).
Definition: The researcher enters the expected target words into the Excel template.
Evaluation: The script evaluate.py processes the files and generates Grading_Results.xlsx.

Technical Implementation

Language: Python 3.x
Dependencies: pandas (Dataframe manipulation), openpyxl (Excel I/O).
License: MIT License

Related Works

This tool serves as the evaluation module for the Gladia Batch Transcriber workflow but can be used independently with any text-based data source.

Files

robomustib/TextSimilarityGrader-TextSimilarityGrader.zip

Files (47.8 kB)

Name	Size	Download all
robomustib/TextSimilarityGrader-TextSimilarityGrader.zip md5:8125957f59109da9eecda901c582e04d	47.8 kB	Preview Download

Additional details

Is supplement to: Software: https://github.com/robomustib/TextSimilarityGrader/tree/TextSimilarityGrader (URL)

Repository URL: https://github.com/robomustib/TextSimilarityGrader

	All versions	This version
Views	45	45
Downloads	16	16
Data volume	765.1 kB	765.1 kB

robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

Authors/Creators

Description

Files

robomustib/TextSimilarityGrader-TextSimilarityGrader.zip

Files (47.8 kB)

Additional details

Related works

Software