Published November 6, 2024 | Version v1
Software documentation Open

How to create high-quality offline video transcriptions and subtitles using Whisper and Python - 6 november 2024

Authors/Creators

  • 1. ROR icon National Library of the Netherlands

Description

The article outlines a method for creating offline, high-quality video transcriptions and subtitles using OpenAI's Whisper model with Python, emphasizing privacy, accuracy, and accessibility without needing cloud-based speech-to-text services.

https://github.com/KBNLresearch/videotools

The author explores the Whisper model for automatic speech recognition (ASR) to address limitations in existing cloud-based services, such as low transcription quality, privacy concerns, file size restrictions, and costs. 

Key advantages of using Whisper include:

  1. Offline Capabilities and Privacy: Whisper's large model (around 3GB) can run locally on a laptop, enabling privacy-compliant transcription without internet dependency.
  2. Language and Accuracy: The model performs exceptionally well with multiple languages, especially Dutch and English, and effectively transcribes complex terms and named entities.
  3. Real-time Processing: The large model provides near real-time transcription speed (a 15-minute video processes in about 15-20 minutes). Smaller, faster models are also available with reduced accuracy.
  4. Subtitle Generation: Whisper can automatically generate accurate subtitles, enhancing accessibility for viewers with hearing impairments

The article includes Python code examples and repository links to help users implement the Whisper-based transcription workflow. Tools like FFmpeg are needed to handle video and audio formats, and optional modules allow transcript refinement using ChatGPT, albeit at a cost to offline privacy.

This summary has been assisted by ChatGPT-4o on 6 November 2024

Files

How to create high-quality offline video transcriptions and subtitles using Whisper and Python.pdf

Additional details

Dates

Issued
2024-11-06