Public Utility Data Liberation Project (PUDL) Data Release
Authors/Creators
- 1. Catalyst Cooperative
Description
v2026.5.0 (2026-05-17)
This is a quarterly PUDL data release, updating datasets that are released on a monthly or quarterly basis, including the EIA-860M, year-to-date EIA-923, EIA-930, and EIA-191. It also includes an annual update for the EIA Annual Energy Outlook (AEO).
Normally this release would also update the EPA CEMS hourly emissions dataset. Unfortunately, the bulk CEMS data product that we archive and process was not published as usual. We are exploring other ways of integrating the updated data.
Enhancements
-
Started distributing the raw XBRL-derived data for FERC Forms 1, 2, 6, 60, and 714 as collections of parquet files, alongside existing SQLite and DuckDB outputs. See PR #5232. This change is primarily in support of making these data available through the PUDL Data Viewer.
FERC 1
-
Added new out_ferc1__yearly_depreciation_factors_sched336 table. See issue #5103 and PR #5112.
-
Added FERC Form 1 respondents’ identification and certification information as core_ferc1__yearly_identification_certification. See #5150 and #5008.
-
Added new out_ferc1__yearly_other_regulatory_assets_sched232 table. See issue #5104 and PR #5170.
Expanded Data Coverage
EIA AEO
EIA-860M
EIA-923
EIA-930
-
Updated EIA-930 data through April 2026. See #5209 and #5216. In the process made accommodations for BA changes resulting from the Southwest Power Pool RTO Expansion
EIA-191
-
Added core_eia191__monthly_gas_storage, a new table containing monthly underground natural gas storage activity reported by operators to EIA on Form 191. Data covers 2014-present, is updated through April 2026, and includes working gas, base gas, and total capacity by storage field. See issue #5209 and PRs #5058 and #5216. Thanks to @irubey for this contribution!
Documentation
-
Added new component to table descriptions showing the most recent data available. See issue #4586 and PR #4632.
-
Added new
forensicstables which can be used to see all input values before PUDL chooses canonical values/golden records in the entity resolution process. See issue #4265 and PR #5157.
Bug Fixes & Data Cleaning
-
Removed the already deprecated
pudl.extract.ferc1.extract_dbf,pudl.extract.ferc1.extract_xbrl,pudl.extract.ferc1.extract_xbrl_generic, andpudl.extract.ferc1.extract_dbf_genericfunctions. The extraction logic is now covered by thepudl.dagster.io_managers.ferc1_xbrl_sqlite_io_managerandpudl.dagster.io_managers.ferc1_dbf_sqlite_io_managerIO Managers. -
Fixed a
TypeErrorin MCOE asset checks wheresum(exc.null_rows)iterated over a DataFrame’s column names as strings instead of counting rows. Replaced withlen(exc.null_rows). See PR #5124. -
Fixed a data integrity bug in the FERC SQLite IO manager where SQLite silently auto-incremented
NULLvalues in single-columnINTEGER PRIMARY KEYcolumns (ROWID aliases) rather than raising anIntegrityError. An explicit null check now catches this case before writing. The bug affected 11 production entity and association tables (e.g.core_eia__entity_plants,core_pudl__entity_utilities_pudl); composite PKs and non-INTEGER single PKs are enforced normally by SQLite and were unaffected. See PR #5124. -
Updated FERC XBRL extraction to handle a new upstream behavior in which empty instant or duration tables are omitted from published filings. See PR #5239.
Quality of Life Improvements
-
Reorganized the test suite from
test/totests/with a three-tier layout that matches the existing Pixi tasks:unit/(fast, no data),integration/(software correctness against ETL outputs), andvalidate/(data quality on prebuilt outputs). The oldintegration/etl_test.pywas dissolved into per-extractor files and adagster/pipeline_test.py. New unit tests were added for MCOE asset checks,no_null_rows,weighted_quantile, and IO manager null-PK behavior. See PR #5124. -
Separated dbt row count checks into a distinct
pytest-validate-row-counts-nightlyPixi stage.*check_row_counts_per_partitionis the most frequently failing dbt test; running it in its own stage produces a clearly labelled line in nightly Slack reports instead of failing the broader data validation stage, making failures easier to triage. The stage is automatically skipped outside of full ETL builds. See PR #5124. -
Renamed the
docker/directory tobuilds/to better reflect that it contains all production build scripts and infrastructure, not just Docker-related files. See PR #5124. -
Updated
dbt_helper update-tables --schemato ingest “human schema input files” (atdbt/schema_inputs/**/schema.human.yml) and generate the actual dbt-visible schema files automatically. This gives us clear separation between human and machine-generated schemas and allows us to add more machine-generated checks. See issue #5208 and PRs #5207 and #5228.
Major Dagster Project Refactor
We did a major overhaul of our Dagster configuration to bring it closer to the framework’s current best-practice recommendations, and also to experiment with the new dg CLI and Dagster agent skills.
See issue #5066 for an overview of the issues involved, including issues #5120, #5123 and PRs #5071, #5124, #5153. This refactor includes the following changes:
-
Replaced the custom ``pudl_etl`` and ``ferc_to_sqlite`` CLI entry points with Dagster’s official
dg launchtool. The old entry points assembled hand-crafted Dagsterrun_configdicts at runtime;dg launchreads YAML config files that are version-controlled alongside the code. Four packaged config files are provided:dg_fast.yml,dg_full.yml,dg_pytest.yml, anddg_nightly.yml. Pixi convenience tasks (pudl-with-ferc-to-sqlite,pudl-with-ferc-to-sqlite-nightly,ferc-to-sqlite) wrap the most common invocations. The integration test suite now runs the ETL viadg launchas a subprocess, so tests exercise exactly the same code path as production. -
Consolidated the PUDL job graph. The previous
etl_fastandetl_fulljobs were thin wrappers assembled at import time. These are replaced by three top-level jobs defined directly inpudl.etl:ferc_to_sqlite(raw FERC prerequisite databases only),pudl(the main PUDL ETL assuming those raw FERC databases already exist), andpudl_with_ferc_to_sqlite(end-to-end build in a single job). The FERC EQR pipeline is now theferceqrjob. Job selection and asset scoping is handled bydg launchconfig files rather than by code. -
Switched to Dagster config YAML files for all run configuration (what years to process, which datasets to include, resource settings). The settings flow is now:
dg launch --config some_dg.yml→pudl.resources.PudlEtlSettingsResourceloads apudl.settings.EtlSettingsobject from a path declared in that YAML → individual assets and IO managers read from the injectedEtlSettings. This replaces the old pattern of serializing Pydantic models to rawrun_configdicts, which required keeping Dagster config schemas manually in sync with the Pydantic models. -
Updated Dagster resources and IO managers to use Pydantic-native
dagster.ConfigurableResourceanddagster.ConfigurableIOManagerbase classes.pudl.workspace.datastore.DatastoreResourceandpudl.workspace.datastore.ZenodoDoiSettingsResourcereplace the legacy@resource-decorated functions;pudl.io_managers.PudlMixedFormatIOManager,pudl.io_managers.FercDbfSqliteConfigurableIOManager, andpudl.io_managers.FercXbrlSqliteConfigurableIOManagerreplace the legacy@io_managerwrappers. Resources now receive settings via Pydantic field injection rather than viadagster.build_init_resource_context()config dicts. -
Added FERC SQLite provenance tracking via the new
pudl.ferc_sqlite_provenancemodule. Each time a FERC SQLite asset materializes, it records a fingerprint asdagster.MaterializeResultmetadata: the Zenodo DOI of the source archive, the years included, and a hash of the ETL settings. When a downstream PUDL asset subsequently loads from that SQLite file, the IO manager checks the stored fingerprint against the current run’s settings and raises a descriptive error if the DOIs, years, or settings are incompatible. This eliminates a class of silent correctness failures that occurred when stale FERC SQLite databases from a previous run were silently reused. -
Replaced the ``disabled: true`` flag in FERC-to-SQLite settings with
years: [](empty list). An emptyyearslist is unambiguous — “process zero years” — and eliminates the need for a separate boolean field that had to be checked in addition to the years list. Thedisabledflag has been removed from all settings classes and YAML config files; FERC 2, 6, and 60 DBF/XBRL configs that previously useddisabled: truenow useyears: []. -
Reorganized the integration test infrastructure in
tests/conftest.py. The old approach ran the PUDL ETL in-process usingexecute_in_process, which bypassed the standarddg launchentry point and required each test fixture to hand-assemble Dagsterrun_configdicts. All three FERC extraction fixtures and thepudl_io_managerfixture are replaced by a singleprebuilt_outputsfixture that runs the fullpudl_with_ferc_to_sqlitejob viadg launchas a subprocess, with coverage collection appended to the existing test coverage report. A persistentdagster.DagsterInstancefixture allows test code to read asset materialisation metadata written by that subprocess. Pytest CLI flags are renamed for clarity:--live-dbs→--live-pudl-output,--tmp-data→--temp-pudl-input,--etl-settings→--dg-config. -
Made
pudl.dagsterthe canonical Dagster orchestration package while keepingpudl.definitionsas the stabledgcode location entrypoint. As part of this boundary cleanup, Dagster-specific resources (including the FERC EQR deployment sensor and the FERC EQR partition definition) were consolidated underpudl.dagster, older top-level Dagster compatibility exposure was removed, and internal imports and documentation were updated to usepudl.dagster. See issue #5123 and PR #5124. -
Cleaned up several legacy package boundaries that had accumulated over time. The
pudl.etlpackage was removed after the Dagster refactor had already moved its substantive content elsewhere — what remained was foreign key validation and a continuity check helper that now live with the validation and asset-check code that actually uses them. Thepudl.convertsubpackage was an arbitrary grouping of two unrelated utilities; each was moved to the package that reflects what it actually does (extraction vs. documentation generation). Thepudl.validatemodule grew into a subpackage to keep dbt orchestration, database integrity checks, and data quality utilities from being lumped together in a single file. See #5123 and PR #5124. -
Consolidated all CLI entry points under
src/pudl/scripts/. Previously,pudl_datastorelived inside the datastore module andpudl_service_territorieslived inside the analysis module — logical homes for the underlying logic, but inconvenient for anyone trying to find all the command-line tools in one place. All scripts are now thin wrappers insrc/pudl/scripts/, with heavy imports deferred so--helpis fast (or… will be, once we thin out the monstrous top-level PUDL imports).pudl_datastorealso gained a new--allflag to download every known dataset without having to enumerate them explicitly. A unit test enforces many of these CLI conventions going forward. See #5123 and PR #5124. -
Renamed the ``eia_bulk_elec`` module to ``eiaapi_electricity`` to match the naming of the underlying source. See #5123 and PR #5124.
-
Standardized acronym capitalization in compound class names. Classes that combined two acronyms (e.g.
FERC+SQLite) were inconsistently named. They now follow the Python convention of treating each acronym as a single title-cased word, soSQLitebecomesSqlitewhen it appears mid-name (e.g.FercDbfSqliteConfigurableIOManager). See #5123 and PR #5124. -
Renamed Pydantic settings classes from
*Settingsto*DataConfigand tightened container field names. The old names were too vague — these classes define which data gets processed, not general application settings. The new names make that explicit and align with Dagster’s ownConfignaming convention. The top-levelEtlSettingsis nowGlobalDataConfig;DatasetsSettings(the PUDL job) is nowPudlDataConfig; and field names on the container classes drop redundant suffixes (e.g.ferc_to_sqlite_settings→ferc_to_sqlite,datasets→pudl). The data config and Dagster config YAML files are updated to match. See PR #5153.
Other PUDL v2026.5.0 Resources
- PUDL v2026.5.0 Data Dictionary
- PUDL v2026.5.0 Documentation
- PUDL in the AWS Open Data Registry
- PUDL v2026.5.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2026.5.0/
- PUDL v2026.5.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2026.5.0/
- Zenodo archive of the PUDL GitHub repo for this release
- PUDL v2026.5.0 release on GitHub
Contact Us
If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:
- Follow us on GitHub
- Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter
- GitHub Discussions is where we provide user support.
- Watch our GitHub Project to see what we're working on.
- Email us at hello@catalyst.coop for private communications.
- On Mastodon: @CatalystCoop@mastodon.energy
- On BlueSky: @catalyst.coop
- Connect with us on LinkedIn
- Play with our data and notebooks on Kaggle
- Combine our data with ML models on HuggingFace
- Learn more about us on our website: https://catalyst.coop
- Subscribe to our announcements list for email updates.
Files
censusdp1tract.sqlite.zip
Files
(17.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a2c740bf9b5817fe3d4c7ebf26a08b95
|
11.0 MB | Download |
|
md5:25c34f7299eb80e8347a5adc6a3b0069
|
506.7 MB | Preview Download |
|
md5:1dbc07ba947fe8a10211658555892d61
|
271.3 MB | Preview Download |
|
md5:b87a6b63c4139c4864b9d575a6ed331a
|
1.0 GB | Download |
|
md5:ce7290979d98e329b7932e7124105382
|
190.9 MB | Preview Download |
|
md5:ef87ca15214d60e01af82c43d73702b9
|
167.9 MB | Preview Download |
|
md5:15d13436604c32226fbf6793f7251735
|
2.0 MB | Preview Download |
|
md5:02c69518e60b733979671e73d2e12f9f
|
7.3 MB | Preview Download |
|
md5:e801ae4dc353a7111ed6832348ffe7af
|
73.9 MB | Preview Download |
|
md5:a68ec6aa5c061a00e38c90b6d8e939c2
|
163.3 MB | Download |
|
md5:871596a059eb41f02b2a13839ee616d0
|
34.9 MB | Preview Download |
|
md5:848b134d78062b6313024fb381bd2d2c
|
23.1 MB | Preview Download |
|
md5:9dbd3733cb82801b18353e71d4a432af
|
2.3 MB | Preview Download |
|
md5:bd09d11e1a79a1623b3d9789bf336c9e
|
7.2 MB | Preview Download |
|
md5:4476505ecd7ac3cffa0df34cef4f030b
|
2.9 MB | Preview Download |
|
md5:7bbc5135e412c0c8c71c47bddc9edf61
|
57.7 MB | Download |
|
md5:6b7eac4c0b2bfb04533463eab620af61
|
5.3 MB | Preview Download |
|
md5:42a3f9e1d57ba866eed0f9691cf8bd26
|
6.6 MB | Preview Download |
|
md5:4e308f9bfcf08df05ce198b8bbd1a0eb
|
974.3 kB | Preview Download |
|
md5:ba679325b7ae96c87b6fa5584cfae491
|
2.0 MB | Preview Download |
|
md5:53aa59a546bef76d53210faf6d4b7453
|
43.5 MB | Preview Download |
|
md5:68a43456bf799472a1e5b2ca3066a171
|
81.8 MB | Download |
|
md5:e19a59bf73d3c1cf5bde45c13f17c25c
|
28.4 MB | Preview Download |
|
md5:4abad6a380b6b55ca00e447173bd000b
|
15.1 MB | Preview Download |
|
md5:14359041f74a53cda9ababcb9970bc57
|
1.3 MB | Preview Download |
|
md5:240ab648e9b6fb13f2540b217638676e
|
3.0 MB | Preview Download |
|
md5:f1047c513992ae6161cb980f03c3a789
|
66.6 MB | Download |
|
md5:b758a1f5ecea9668135201b2bae1a6bb
|
193.6 MB | Preview Download |
|
md5:2f1bc75a04756ecb3d0ccccead054e30
|
35.5 MB | Preview Download |
|
md5:cca9a4b4b016e1c1fa981e5008218b96
|
63.0 kB | Preview Download |
|
md5:694686904ca1871e1b96f2002d2a136d
|
192.9 kB | Preview Download |
|
md5:fed16c8f6aac11dc144cf888333e3203
|
3.2 GB | Preview Download |
|
md5:3f6b524abc6ff0dbf22df54fa411f1f4
|
11.6 GB | Preview Download |
|
md5:cb2ea1335034c9df94360e170cd8a49d
|
1.5 MB | Preview Download |