Published February 21, 2023 | Version v4.0.0
Software Open

CouncilDataProject/cookiecutter-cdp-deployment: Whisper and GCP Compute Runners

  • 1. University of Washington Information School, University of Washington, Seattle
  • 2. University of Washington, Seattle
  • 3. Washington University, St. Louis

Description

CouncilDataProject cdp-backend v4.0.0

:warning: :warning: This is a major breaking release. Instance maintainers should update the instance with just update-from-cookiecutter. :warning: :warning:

Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.

This library, cookiecutter-cdp-deployment ties together multiple projects to make a single deployable infrastructure.

v4.0.0

There are two main changes for this release.

  1. We are swapping out Google Speech-to-Text for OpenAIs Whisper.

Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).

To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.

  1. We have switched from MIT to MPLv2 License.

Unless you are trying to fork our code and take it private, this won't affect you.

Files

CouncilDataProject/cookiecutter-cdp-deployment-v4.0.0.zip

Files (2.1 MB)

Additional details