Published May 18, 2026 | Version v2026.5.0
Dataset Open

Public Utility Data Liberation Project (PUDL) Data Release

Description

v2026.5.0 (2026-05-17)

This is a quarterly PUDL data release, updating datasets that are released on a monthly or quarterly basis, including the EIA-860M, year-to-date EIA-923, EIA-930, and EIA-191. It also includes an annual update for the EIA Annual Energy Outlook (AEO).

Normally this release would also update the EPA CEMS hourly emissions dataset. Unfortunately, the bulk CEMS data product that we archive and process was not published as usual. We are exploring other ways of integrating the updated data.

Enhancements

  • Started distributing the raw XBRL-derived data for FERC Forms 1, 2, 6, 60, and 714 as collections of parquet files, alongside existing SQLite and DuckDB outputs. See PR #5232. This change is primarily in support of making these data available through the PUDL Data Viewer.

FERC 1

Expanded Data Coverage

EIA AEO

  • Added 2026 Projections from EIA AEO. See issue #5182 and PR #5198.

EIA-860M

  • Added EIA-860M data through March 2026. See issue #5225 and PR #5230.

EIA-923

  • Added year-to-date updates for EIA-923 data through December 2025. See issue #5226 and PR #5230.

EIA-930

EIA-191

  • Added core_eia191__monthly_gas_storage, a new table containing monthly underground natural gas storage activity reported by operators to EIA on Form 191. Data covers 2014-present, is updated through April 2026, and includes working gas, base gas, and total capacity by storage field. See issue #5209 and PRs #5058 and #5216. Thanks to @irubey for this contribution!

Documentation

  • Added new component to table descriptions showing the most recent data available. See issue #4586 and PR #4632.

  • Added new forensics tables which can be used to see all input values before PUDL chooses canonical values/golden records in the entity resolution process. See issue #4265 and PR #5157.

Bug Fixes & Data Cleaning

  • Removed the already deprecated pudl.extract.ferc1.extract_dbf, pudl.extract.ferc1.extract_xbrl, pudl.extract.ferc1.extract_xbrl_generic, and pudl.extract.ferc1.extract_dbf_generic functions. The extraction logic is now covered by the pudl.dagster.io_managers.ferc1_xbrl_sqlite_io_manager and pudl.dagster.io_managers.ferc1_dbf_sqlite_io_manager IO Managers.

  • Fixed a TypeError in MCOE asset checks where sum(exc.null_rows) iterated over a DataFrame’s column names as strings instead of counting rows. Replaced with len(exc.null_rows). See PR #5124.

  • Fixed a data integrity bug in the FERC SQLite IO manager where SQLite silently auto-incremented NULL values in single-column INTEGER PRIMARY KEY columns (ROWID aliases) rather than raising an IntegrityError. An explicit null check now catches this case before writing. The bug affected 11 production entity and association tables (e.g. core_eia__entity_plants, core_pudl__entity_utilities_pudl); composite PKs and non-INTEGER single PKs are enforced normally by SQLite and were unaffected. See PR #5124.

  • Updated FERC XBRL extraction to handle a new upstream behavior in which empty instant or duration tables are omitted from published filings. See PR #5239.

Quality of Life Improvements

  • Reorganized the test suite from test/ to tests/ with a three-tier layout that matches the existing Pixi tasks: unit/ (fast, no data), integration/ (software correctness against ETL outputs), and validate/ (data quality on prebuilt outputs). The old integration/etl_test.py was dissolved into per-extractor files and a dagster/pipeline_test.py. New unit tests were added for MCOE asset checks, no_null_rows, weighted_quantile, and IO manager null-PK behavior. See PR #5124.

  • Separated dbt row count checks into a distinct pytest-validate-row-counts-nightly Pixi stage.* check_row_counts_per_partition is the most frequently failing dbt test; running it in its own stage produces a clearly labelled line in nightly Slack reports instead of failing the broader data validation stage, making failures easier to triage. The stage is automatically skipped outside of full ETL builds. See PR #5124.

  • Renamed the docker/ directory to builds/ to better reflect that it contains all production build scripts and infrastructure, not just Docker-related files. See PR #5124.

  • Updated dbt_helper update-tables --schema to ingest “human schema input files” (at dbt/schema_inputs/**/schema.human.yml) and generate the actual dbt-visible schema files automatically. This gives us clear separation between human and machine-generated schemas and allows us to add more machine-generated checks. See issue #5208 and PRs #5207 and #5228.

Major Dagster Project Refactor

We did a major overhaul of our Dagster configuration to bring it closer to the framework’s current best-practice recommendations, and also to experiment with the new dg CLI and Dagster agent skills.

See issue #5066 for an overview of the issues involved, including issues #5120, #5123 and PRs #5071, #5124, #5153. This refactor includes the following changes:

  • Replaced the custom ``pudl_etl`` and ``ferc_to_sqlite`` CLI entry points with Dagster’s official dg launch tool. The old entry points assembled hand-crafted Dagster run_config dicts at runtime; dg launch reads YAML config files that are version-controlled alongside the code. Four packaged config files are provided: dg_fast.yml, dg_full.yml, dg_pytest.yml, and dg_nightly.yml. Pixi convenience tasks (pudl-with-ferc-to-sqlite, pudl-with-ferc-to-sqlite-nightly, ferc-to-sqlite) wrap the most common invocations. The integration test suite now runs the ETL via dg launch as a subprocess, so tests exercise exactly the same code path as production.

  • Consolidated the PUDL job graph. The previous etl_fast and etl_full jobs were thin wrappers assembled at import time. These are replaced by three top-level jobs defined directly in pudl.etl: ferc_to_sqlite (raw FERC prerequisite databases only), pudl (the main PUDL ETL assuming those raw FERC databases already exist), and pudl_with_ferc_to_sqlite (end-to-end build in a single job). The FERC EQR pipeline is now the ferceqr job. Job selection and asset scoping is handled by dg launch config files rather than by code.

  • Switched to Dagster config YAML files for all run configuration (what years to process, which datasets to include, resource settings). The settings flow is now: dg launch --config some_dg.ymlpudl.resources.PudlEtlSettingsResource loads a pudl.settings.EtlSettings object from a path declared in that YAML → individual assets and IO managers read from the injected EtlSettings. This replaces the old pattern of serializing Pydantic models to raw run_config dicts, which required keeping Dagster config schemas manually in sync with the Pydantic models.

  • Updated Dagster resources and IO managers to use Pydantic-native dagster.ConfigurableResource and dagster.ConfigurableIOManager base classes. pudl.workspace.datastore.DatastoreResource and pudl.workspace.datastore.ZenodoDoiSettingsResource replace the legacy @resource-decorated functions; pudl.io_managers.PudlMixedFormatIOManager, pudl.io_managers.FercDbfSqliteConfigurableIOManager, and pudl.io_managers.FercXbrlSqliteConfigurableIOManager replace the legacy @io_manager wrappers. Resources now receive settings via Pydantic field injection rather than via dagster.build_init_resource_context() config dicts.

  • Added FERC SQLite provenance tracking via the new pudl.ferc_sqlite_provenance module. Each time a FERC SQLite asset materializes, it records a fingerprint as dagster.MaterializeResult metadata: the Zenodo DOI of the source archive, the years included, and a hash of the ETL settings. When a downstream PUDL asset subsequently loads from that SQLite file, the IO manager checks the stored fingerprint against the current run’s settings and raises a descriptive error if the DOIs, years, or settings are incompatible. This eliminates a class of silent correctness failures that occurred when stale FERC SQLite databases from a previous run were silently reused.

  • Replaced the ``disabled: true`` flag in FERC-to-SQLite settings with years: [] (empty list). An empty years list is unambiguous — “process zero years” — and eliminates the need for a separate boolean field that had to be checked in addition to the years list. The disabled flag has been removed from all settings classes and YAML config files; FERC 2, 6, and 60 DBF/XBRL configs that previously used disabled: true now use years: [].

  • Reorganized the integration test infrastructure in tests/conftest.py. The old approach ran the PUDL ETL in-process using execute_in_process, which bypassed the standard dg launch entry point and required each test fixture to hand-assemble Dagster run_config dicts. All three FERC extraction fixtures and the pudl_io_manager fixture are replaced by a single prebuilt_outputs fixture that runs the full pudl_with_ferc_to_sqlite job via dg launch as a subprocess, with coverage collection appended to the existing test coverage report. A persistent dagster.DagsterInstance fixture allows test code to read asset materialisation metadata written by that subprocess. Pytest CLI flags are renamed for clarity: --live-dbs--live-pudl-output, --tmp-data--temp-pudl-input, --etl-settings--dg-config.

  • Made pudl.dagster the canonical Dagster orchestration package while keeping pudl.definitions as the stable dg code location entrypoint. As part of this boundary cleanup, Dagster-specific resources (including the FERC EQR deployment sensor and the FERC EQR partition definition) were consolidated under pudl.dagster, older top-level Dagster compatibility exposure was removed, and internal imports and documentation were updated to use pudl.dagster. See issue #5123 and PR #5124.

  • Cleaned up several legacy package boundaries that had accumulated over time. The pudl.etl package was removed after the Dagster refactor had already moved its substantive content elsewhere — what remained was foreign key validation and a continuity check helper that now live with the validation and asset-check code that actually uses them. The pudl.convert subpackage was an arbitrary grouping of two unrelated utilities; each was moved to the package that reflects what it actually does (extraction vs. documentation generation). The pudl.validate module grew into a subpackage to keep dbt orchestration, database integrity checks, and data quality utilities from being lumped together in a single file. See #5123 and PR #5124.

  • Consolidated all CLI entry points under src/pudl/scripts/. Previously, pudl_datastore lived inside the datastore module and pudl_service_territories lived inside the analysis module — logical homes for the underlying logic, but inconvenient for anyone trying to find all the command-line tools in one place. All scripts are now thin wrappers in src/pudl/scripts/, with heavy imports deferred so --help is fast (or… will be, once we thin out the monstrous top-level PUDL imports). pudl_datastore also gained a new --all flag to download every known dataset without having to enumerate them explicitly. A unit test enforces many of these CLI conventions going forward. See #5123 and PR #5124.

  • Renamed the ``eia_bulk_elec`` module to ``eiaapi_electricity`` to match the naming of the underlying source. See #5123 and PR #5124.

  • Standardized acronym capitalization in compound class names. Classes that combined two acronyms (e.g. FERC + SQLite) were inconsistently named. They now follow the Python convention of treating each acronym as a single title-cased word, so SQLite becomes Sqlite when it appears mid-name (e.g. FercDbfSqliteConfigurableIOManager). See #5123 and PR #5124.

  • Renamed Pydantic settings classes from *Settings to *DataConfig and tightened container field names. The old names were too vague — these classes define which data gets processed, not general application settings. The new names make that explicit and align with Dagster’s own Config naming convention. The top-level EtlSettings is now GlobalDataConfig; DatasetsSettings (the PUDL job) is now PudlDataConfig; and field names on the container classes drop redundant suffixes (e.g. ferc_to_sqlite_settingsferc_to_sqlite, datasetspudl). The data config and Dagster config YAML files are updated to match. See PR #5153.

Other PUDL v2026.5.0 Resources

Contact Us

If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

Files

censusdp1tract.sqlite.zip

Files (17.9 GB)

Name Size Download all
md5:a2c740bf9b5817fe3d4c7ebf26a08b95
11.0 MB Download
md5:25c34f7299eb80e8347a5adc6a3b0069
506.7 MB Preview Download
md5:1dbc07ba947fe8a10211658555892d61
271.3 MB Preview Download
md5:b87a6b63c4139c4864b9d575a6ed331a
1.0 GB Download
md5:ce7290979d98e329b7932e7124105382
190.9 MB Preview Download
md5:ef87ca15214d60e01af82c43d73702b9
167.9 MB Preview Download
md5:15d13436604c32226fbf6793f7251735
2.0 MB Preview Download
md5:02c69518e60b733979671e73d2e12f9f
7.3 MB Preview Download
md5:e801ae4dc353a7111ed6832348ffe7af
73.9 MB Preview Download
md5:a68ec6aa5c061a00e38c90b6d8e939c2
163.3 MB Download
md5:871596a059eb41f02b2a13839ee616d0
34.9 MB Preview Download
md5:848b134d78062b6313024fb381bd2d2c
23.1 MB Preview Download
md5:9dbd3733cb82801b18353e71d4a432af
2.3 MB Preview Download
md5:bd09d11e1a79a1623b3d9789bf336c9e
7.2 MB Preview Download
md5:4476505ecd7ac3cffa0df34cef4f030b
2.9 MB Preview Download
md5:7bbc5135e412c0c8c71c47bddc9edf61
57.7 MB Download
md5:6b7eac4c0b2bfb04533463eab620af61
5.3 MB Preview Download
md5:42a3f9e1d57ba866eed0f9691cf8bd26
6.6 MB Preview Download
md5:4e308f9bfcf08df05ce198b8bbd1a0eb
974.3 kB Preview Download
md5:ba679325b7ae96c87b6fa5584cfae491
2.0 MB Preview Download
md5:53aa59a546bef76d53210faf6d4b7453
43.5 MB Preview Download
md5:68a43456bf799472a1e5b2ca3066a171
81.8 MB Download
md5:e19a59bf73d3c1cf5bde45c13f17c25c
28.4 MB Preview Download
md5:4abad6a380b6b55ca00e447173bd000b
15.1 MB Preview Download
md5:14359041f74a53cda9ababcb9970bc57
1.3 MB Preview Download
md5:240ab648e9b6fb13f2540b217638676e
3.0 MB Preview Download
md5:f1047c513992ae6161cb980f03c3a789
66.6 MB Download
md5:b758a1f5ecea9668135201b2bae1a6bb
193.6 MB Preview Download
md5:2f1bc75a04756ecb3d0ccccead054e30
35.5 MB Preview Download
md5:cca9a4b4b016e1c1fa981e5008218b96
63.0 kB Preview Download
md5:694686904ca1871e1b96f2002d2a136d
192.9 kB Preview Download
md5:fed16c8f6aac11dc144cf888333e3203
3.2 GB Preview Download
md5:3f6b524abc6ff0dbf22df54fa411f1f4
11.6 GB Preview Download
md5:cb2ea1335034c9df94360e170cd8a49d
1.5 MB Preview Download