Unlocking image, audio, and video data in the Industry Documents Library
Authors/Creators
Description
The Industry Documents Library is a digital archive of documents created by industries which influence public health, hosted by the University of California, San Francisco Library. This archive contains millions of video, audio, and image files from the tobacco, opioids, fossil fuel, drug, and food industries, including advertisements, legal depositions, internal marketing documents, public health campaigns, and other historical records. This session will start with a presentation and overview of the contents of the IDL and search interface. Next, we will introduce a python based, open-source stack researchers can use to analyze, transcribe, and categorize data in IDL video, audio, and image files. Although participants will have an opportunity to try out these technologies during the workshop, the primary focus will be an overview of available tools and data, and participation in the programming sections is optional.
This workshop was part of the UC Love Data Week 2025 program (https://uc-love-data-week.github.io)
Files
Unlocking_image_audio_video_data_Industry_Documents_Library.mp4
Files
(137.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:229b59ec32e2d243559ae7da35a1d851
|
10.6 MB | Download |
|
md5:3e6ad172759e6c9b2fd62bae59ee5878
|
126.4 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/geoffswc/Love-Data-Week-Transcription-Sentiment-Classification-2025
- Programming language
- Jupyter Notebook , Python