Public Utility Data Liberation Project (PUDL) Data Release
Creators
- 1. Catalyst Cooperative
Description
PUDL v2024.5.0 Data Release
We've just completed our quarterly integration of EIA data sources for 2024Q2 (in support of RMI's Utility Transition Hub) and have also added a bunch of new tables over the last few months in an effort to better support energy system modelers (with support from GridLab).
New Data Coverage
EIA-860 & EIA-923
- Added cleaned EIA860 Schedule 8E FGD Equipment and EIA923 Schedule 8C FGD Operation and Maintenance data to the PUDL database as
_core_eia923__fgd_operation_maintenance and _core_eia860__fgd_equipment. Once harvested, these tables will eventually be
removed from the database, but they are being published until then. See issues #3394 and #3392, and PR #3403. - Added new core_eia860__scd_generators_wind table from EIA860 Schedule 3.2 which contains wind generator attributes. See PRs #3522 and #3494.
- Added new core_eia860__scd_generators_solar table from EIA860 Schedule 3.3 which contains solar generator attributes. See PRs #3524 and #3482.
- Added new core_eia860__scd_generators_energy_storage table from EIA860 Schedule 3.4 which contains energy storage generator attributes. See PRs #3488 and #3526.
- Added new core_eia923__monthly_energy_storage table from EIA923 which contains monthly energy and fuel consumption metrics. See PRs #3516 and #3546.
- Added 2024 Q1 EIA923 and EIA860m data. See issues #3617 and #3618, and PR #3625.
GridPath RA Toolkit
- Added a new
gridpathratoolkitdata source containing hourly wind and solar generation profiles from the GridPath Resoure Adequacy Toolkit. See our documentation and the new Zenodo archive, PR #3489 and this PUDL archiver issue. - Integrated the most processed version of the GridPath RA Toolkit wind and solar generation profiles, as well as the tables describing how individual generators were aggregated together to create the profiles. See issues #3509, #3510, #3511, and #3515 as well as PR #3514. The new tables include: out_gridpathratoolkit__hourly_available_capacity_factor and core_gridpathratoolkit__assn_generator_aggregation_group.
EIA AEO
- Extracted tables 13, 15, 20, and 54 from the EIA Annual Energy Outlook 2023, which include future projections related to electric power and renewable energy through the year 2050, across a variety of scenarios. See issue #3368 and PR #3538.
- Added new tables from EIA AEO table 54:
- :ref:`core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3582.
- :ref:`core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type` contains generation capacity & generation projections for the electric sector, broken out by technology type. See issue #3581 and PR #3598.
- :ref:`core_eiaaeo__yearly_projected_electric_sales` contains electric sales projections until 2050, broken out by customer type. See issue #3581 and PR #3617.
NREL ATB
- Added new NREL ATB tables with annual technology cost and performance projections. See issue #3465 and PRs #3498, #3570.
EIA-930
- Added hourly generation, demand, and interchange tables from the EIA-930. See issues #3486 and #3505, PR #3584, and this issue in the PUDL archiver repo. See the data source documentation for more information.
EPA CEMS
EIA Bulk Electricity Data
- Updated the EIA Bulk Electricity data archive to include data that was available as of 2024-05-01, which covers up through 2024-02-01 (3 months more than the previously used archive). See PR #3615.
FERC Form 1
- Added new out_ferc1__yearly_rate_base table which includes granular financial data regarding what utilities include in their rate bases. See epic #2016.
Data Cleaning
- When
generator_operating_datevalues are too inconsistent to be harvested successfully, we now take the max date within a year and attempt to harvest again, to rescue records lost because of inconsistent month reporting in EIA 860 and 860M. See issue #3340 and PR #3419. This change also fixed a bug that was preventing other columns harvested with a special process from being saved. - When ingesting FERC 1 XBRL filings, we now take the most recent non-null value instead of the value from the latest filing that applies for a specific row. This means that we no longer lose data if a utility posts a FERC filing with only a small number of updated values. See issue #3309 and PR #3545.
EIA - FERC1 Record Linkage Model Update
We merged in a refactor of the EIA plant parts to FERC1 plants record linkage model, which was generously supported by a CCAI Innovation Grant. This replaced the linear regression model with a model built with the Python package Splink. Splink provides helpful visualizations to understand model performance and parameter tuning, which can be generated with devtools/splink-ferc1-eia-match.ipynb. We measured model performance with precision - a measure of accuracy when the model makes a prediction, recall - a measure of coverage of FERC records model predicted a match for, and accuracy - a measure of overall correctness of the predictions. Model performance improved and now has a precision of .94, recall of .9, and overall accuracy of .85.
Schema Changes
- Added
balancing_authority_code_eiaandsector_id_eiainto the core_eia860m__changelog_generators table. The BA codes reported in the raw data contained a lot of non-standard values, which have now been standardized. See issue #3437 and PR #3442. - Renamed the
utc_datetimecolumn found in the FERC-714 tables todatetime_utcin order to be consistent withoperating_datetime_utcin the EPA CEMS data, and the new hourly renewable generation profiles in the GridPath RA Toolkit. See PR #3514. - Renamed the utility and balancing authority service territory tables to better conform to our naming conventions:
out_eia861__compiled_geometry_utilitiesis now out_eia861__yearly_utility_service_territory andout_eia861__compiled_geometry_balancing_authoritiesis now out_eia861__yearly_balancing_authority_service_territory. See PR #3552. - All hourly tables are now published only as Apache Parquet files, rather than being written to the main PUDL SQLite database. This reduces the size of the PUDL DB, and also makes accessing these large table much faster both during data processing and for end users. See PR #3584. Affected tables include:
- core_eia930__hourly_interchange
- core_eia930__hourly_net_generation_by_energy_source
- core_eia930__hourly_operations
- core_eia930__hourly_subregion_demand
- core_epacems__hourly_emissions
- out_ferc714__hourly_estimated_state_demand
- out_ferc714__hourly_planning_area_demand
- out_gridpathratoolkit__hourly_available_capacity_factor
- The FERC-714 hourly demand tables have been removed from the pudl.output.pudltabl.PudlTabl class, which has been deprecated.
- The long derelict
core_ferc__codes_accountstable has been removed from the PUDL database. This table contained descriptions of the FERC accounts that were found in the Electric Plant in Service table, but only pertained to a single year, and was not being referenced or maintained elsewhere. See PR #3584. - Additional columns were added to the core_eia__codes_balancing_authorities table, indicating the timezone associated with each BA's reporting, whether it is a generation only BA, and its date of retirement, and what region it is part of. See PR #3584.
- A new core_eia__codes_balancing_authority_subregions table was added to describe the relationships between BAs and their subregions. See PR #3584.
Bug Fixes
Ensure that all columns fed into the harvesting / reconciliation process are encoded before harvesting takes place, improving the consistency of harvested fields. See issue #3542 and PR #3558. This change also simplifies the encoding process in the vast majority of cases, since the same global set of encoders can be used on any dataframe, with every column encoded based on the field definitions and FK constraints associated with the column name.
CLI Changes
Removed the --clobber option from the ferc_to_sqlite command and associated assets. We rebuild these databases infrequently, and needing to either edit the runtime parameters in Dagster's Launchpad or remove the existing databases from the filesystem manually are brittle. Partly in response to issue #3612; see PR #3622.
Other PUDL v2024.5.0 Resources
- PUDL v2024.5.0 Data Dictionary
- PUDL v2024.5.0 Documentation
- PUDL in the AWS Open Data Registry
- PUDL v2024.5.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2024.5.0/
- PUDL v2024.5.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2024.5.0/
- Zenodo archive of the PUDL GitHub repo for this release
- PUDL v2024.5.0 release on GitHub
- PUDL v2024.5.0 package in the Python Package Index (PyPI)
Contact Us
If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:
- Follow us on GitHub
- Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter
- GitHub Discussions is where we provide user support.
- Watch our GitHub Project to see what we're working on.
- Email us at hello@catalyst.coop for private communications.
- On Mastodon: @CatalystCoop@mastodon.energy
- On BlueSky: @catalyst.coop
- On Twitter: @CatalystCoop
- Play with our data and notebooks on Kaggle
- Combine our data with ML models on HuggingFace
- Learn more about us on our website: https://catalyst.coop
- Subscribe to our announcements list for email updates.
Files
ferc1_xbrl_datapackage.json
Files
(9.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:560862b6eda63dd9c99034ec4995cf14
|
6.4 MB | Download |
|
md5:a4bea8119e67502dfdf503f953fc8179
|
506.7 MB | Download |
|
md5:fa4f3586790ed4438b7e3acc61df8f78
|
67.1 MB | Download |
|
md5:c63b3e9c43574e2497b027c65f397f73
|
64.9 MB | Download |
|
md5:cc6cf41a9e8a93d425d0ea3712b61539
|
79.4 MB | Download |
|
md5:7f4d5ff59151fc95fd61fc6cfc54025f
|
9.8 MB | Download |
|
md5:bf5f4213416070970d1df4e497b5e622
|
5.4 GB | Download |
|
md5:ea6dea30c134d4ee702a7350534cc5aa
|
275.5 MB | Download |
|
md5:a607d95b3e90a7adbbe0f60f08dc82da
|
97.2 MB | Download |
|
md5:e96db21413b81ea068ec44ac6c42b6fb
|
1.7 MB | Preview Download |
|
md5:026ad62c418e5e8aab4a85e6a68d628a
|
7.3 MB | Preview Download |
|
md5:a59aedfeb1d2e0498786680fc3e61bba
|
74.5 MB | Download |
|
md5:1536e1eec1ebf2a1c28de6188c23da38
|
13.8 MB | Download |
|
md5:efcd21f96a10fb21e4c62c671f6371f0
|
2.0 MB | Preview Download |
|
md5:fbbd750509118029d7d675e573f6ad5a
|
7.1 MB | Preview Download |
|
md5:f4512fd566a296a1905872ce29752492
|
2.9 MB | Download |
|
md5:04569ee928ec487858b448dd1dd7d652
|
2.3 MB | Download |
|
md5:a3cdc95139ea96e46b4b2a231ef785a7
|
748.9 kB | Preview Download |
|
md5:0cc4fca785314082b8f514bae888600e
|
1.9 MB | Preview Download |
|
md5:0cc8f68819a45378e4c9c0ecf884de78
|
43.9 MB | Download |
|
md5:571b1e3649011d4acf3f1475a4d09966
|
10.6 MB | Download |
|
md5:74fa9bde52a973826ecef44c65249e7a
|
1.1 MB | Preview Download |
|
md5:2c85dae41667448ea32066726f76ae5e
|
2.9 MB | Preview Download |
|
md5:73f2028f0d495d13e287287ee49473a7
|
102.2 MB | Download |
|
md5:1243f03c0121d5a7993696ef9afca655
|
59.8 kB | Preview Download |
|
md5:9fef935f9a970839319ada082d6c9672
|
192.4 kB | Preview Download |
|
md5:c6b3276d15f1a5fa75c58565dda138d2
|
105.1 MB | Download |
|
md5:31c234dda989ccdfb474af27fe6034e8
|
61.1 MB | Download |
|
md5:51c4824c58b0273cc2cb91e31a605fcc
|
55.4 MB | Download |
|
md5:7ee6dce8806629d7a3ab3c4e992373a1
|
2.4 GB | Download |