CouncilDataProject/cookiecutter-cdp-deployment: Whisper and GCP Compute Runners
Authors/Creators
- 1. University of Washington Information School, University of Washington, Seattle
- 2. University of Washington, Seattle
- 3. Washington University, St. Louis
Description
:warning: :warning: This is a major breaking release. Instance maintainers should update the instance with just update-from-cookiecutter. :warning: :warning:
Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.
This library, cookiecutter-cdp-deployment ties together multiple projects to make a single deployable infrastructure.
There are two main changes for this release.
- We are swapping out Google Speech-to-Text for OpenAIs Whisper.
Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).
To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.
- We have switched from MIT to MPLv2 License.
Unless you are trying to fork our code and take it private, this won't affect you.
Files
CouncilDataProject/cookiecutter-cdp-deployment-v4.0.0.zip
Files
(2.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:247303ce37a02f1e34989b77ef0e5f90
|
2.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/CouncilDataProject/cookiecutter-cdp-deployment/tree/v4.0.0 (URL)