Dataset Open Access

# PUDL Data Release v1.1.0

Selvans, Zane A.; Gosnell, Christina M.

### DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<identifier identifierType="DOI">10.5281/zenodo.3672068</identifier>
<creators>
<creator>
<creatorName>Selvans, Zane A.</creatorName>
<givenName>Zane A.</givenName>
<familyName>Selvans</familyName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-9961-7208</nameIdentifier>
<affiliation>Catalyst Cooperative</affiliation>
</creator>
<creator>
<creatorName>Gosnell, Christina M.</creatorName>
<givenName>Christina M.</givenName>
<familyName>Gosnell</familyName>
<affiliation>Catalyst Cooperative</affiliation>
</creator>
</creators>
<titles>
<title>PUDL Data Release v1.1.0</title>
</titles>
<publisher>Zenodo</publisher>
<publicationYear>2020</publicationYear>
<subjects>
<subject>electricity</subject>
<subject>EIA 860</subject>
<subject>EIA 923</subject>
<subject>FERC Form 1</subject>
<subject>EPA CEMS</subject>
<subject>energy</subject>
<subject>utility</subject>
<subject>Environmental Protection Agency</subject>
<subject>Federal Energy Regulatory Commission</subject>
<subject>FERC</subject>
<subject>EPA</subject>
<subject>EIA</subject>
</subjects>
<dates>
<date dateType="Issued">2020-02-18</date>
</dates>
<language>en</language>
<resourceType resourceTypeGeneral="Dataset"/>
<alternateIdentifiers>
<alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3672068</alternateIdentifier>
</alternateIdentifiers>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsCompiledBy" resourceTypeGeneral="Software">10.5281/zenodo.3671600</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="URL" relationType="IsCompiledBy" resourceTypeGeneral="Software">https://github.com/catalyst-cooperative/pudl/tree/v0.3.2</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsSupplementedBy" resourceTypeGeneral="Dataset">10.5281/zenodo.3677548</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.3653158</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/catalyst-cooperative</relatedIdentifier>
</relatedIdentifiers>
<version>1.1.0</version>
<rightsList>
<rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
</rightsList>
<descriptions>
<description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;PUDL Data Release 1.1.0&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the second data release from the &lt;a href="https://catalyst.coop/pudl"&gt;Public Utility Data Liberation (PUDL) project&lt;/a&gt;. It can be referenced &amp;amp; cited using &lt;a href="https://doi.org/10.5281/zenodo.3672068"&gt;https://doi.org/10.5281/zenodo.3672068&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more information about the free and open source software used to generate this data release, see &lt;a href="https://github.com/catalyst-cooperative/pudl"&gt;Catalyst Cooperative&amp;#39;s PUDL repository on Github&lt;/a&gt;, and the associated &lt;a href="https://catalystcoop-pudl.readthedocs.io/en/v0.3.2/"&gt;documentation on Read The Docs&lt;/a&gt;. This data release was generated using v0.3.2 of the &lt;code&gt;catalystcoop.pudl&lt;/code&gt; python package.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Included Data Packages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release consists of three tabular data packages, conforming to the standards published by &lt;a href="https://frictionlessdata.io"&gt;Frictionless Data&lt;/a&gt; and the &lt;a href="https://okfn.org"&gt;Open Knowledge Foundation&lt;/a&gt;. The data are stored in CSV files (some of which are compressed using gzip), and the associated metadata is stored as JSON. These tabular data can be used to populate a relational database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;pudl-eia860-eia923&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data originally collected and published by the &lt;a href="https://www.eia.gov/"&gt;US Energy Information Administration&lt;/a&gt; (US EIA) in their &lt;a href="https://www.eia.gov/electricity/data/eia860/"&gt;Form 860&lt;/a&gt; and &lt;a href="https://www.eia.gov/electricity/data/eia923/"&gt;Form 923&lt;/a&gt;, covering the years 2009-2018. A large majority of the data published in the original data sources has been included, but some parts, like fuel stocks on hand, and EIA 923 schedules 6, 7, &amp;amp; 8 have not yet been integrated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;pudl-eia860-eia923-epacems&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This data package contains all of the same data as the &lt;code&gt;pudl-eia860-eia923&lt;/code&gt; package above, as well as the Hourly Emissions data from the US Environmental Protection Agency&amp;#39;s (EPA&amp;#39;s) &lt;a href="https://www.epa.gov/emc/emc-continuous-emission-monitoring-systems"&gt;Continuous Emissions Monitoring System&lt;/a&gt; (CEMS) from 1995-2018. The EPA CEMS data covers thousands of power plants at hourly resolution for decades, and contains close to a billion records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;pudl-ferc1&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Seven data tables from &lt;a href="https://www.ferc.gov/docs-filing/forms/form-1/data.asp"&gt;FERC Form 1&lt;/a&gt; are included, primarily relating to individual power plants, and covering the years 1994-2018 (the entire span of time for which FERC provides this data).&lt;/p&gt;

&lt;p&gt;These tables are the only ones which have been subjected to any cleaning or organization for programmatic use within PUDL. The complete, raw FERC Form 1 database contains 116 different tables with many thousands of columns of mostly financial data. We will archive a complete copy of the multi-year FERC Form 1 Database as a file-based SQLite database at Zenodo, independent of this data release. It can also be re-generated using the &lt;code&gt;catalystcoop.pudl&lt;/code&gt; Python package and the original source data files archived as part of this data release.&lt;/p&gt;

&lt;p&gt;If you&amp;#39;re using PUDL, we would love to hear from you! Even if it&amp;#39;s just a note to let us know that you exist, and how you&amp;#39;re using the software or data. You can also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;a href="https://github.com/catalyst-cooperative/pudl/issues"&gt;Github issue tracker&lt;/a&gt; to file bugs, suggest improvements, or ask for help.&lt;/li&gt;
&lt;li&gt;Email the project team at &lt;a href="mailto:pudl@catalyst.coop"&gt;pudl@catalyst.coop&lt;/a&gt; for private communications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Using the Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The data packages are just CSVs (data) and JSON (metadata) files. They can be used with a variety of tools on many platforms. However, the data is organized primarily with the idea that it will be loaded into a relational database, and the PUDL Python package that was used to generate this data release can facilitate that process. Once the data is loaded into a database, you can access that DB however you like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make sure &lt;code&gt;conda&lt;/code&gt; is installed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;None of these commands will work without the &lt;code&gt;conda&lt;/code&gt; Python package manager installed, either via Anaconda or &lt;code&gt;miniconda&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anaconda.com/distribution/"&gt;Install Anaconda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.conda.io/en/latest/miniconda.html"&gt;Install miniconda&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First download the files from the Zenodo archive into a new empty directory. &lt;strong&gt;A couple of them are very large (5-10 GB)&lt;/strong&gt;, and depending on what you&amp;#39;re trying to do you may not need them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you don&amp;#39;t want to recreate the data release from scratch by re-running the entire ETL process yourself, and you don&amp;#39;t want to create a full clone of the original FERC Form 1 database, including all of the data that has not yet been integrated into PUDL, then you don&amp;#39;t need to download &lt;code&gt;pudl-input-data.tgz&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If you don&amp;#39;t need the EPA CEMS Hourly Emissions data, you do not need to download &lt;code&gt;pudl-eia860-eia923-epacems.tgz&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Load All of PUDL in a Single Line&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;cd&lt;/code&gt; to get into your new directory at the terminal (in Linux or Mac OS), or open up an Anaconda terminal in that directory if you&amp;#39;re on Windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you have downloaded all of the files from the archive&lt;/strong&gt;, and you want it all to be accessible locally, you can run a single shell script, called &lt;code&gt;load-pudl.sh&lt;/code&gt;:&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This will do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load the FERC Form 1, EIA Form 860, and EIA Form 923 data packages into an SQLite database which can be found at &lt;code&gt;sqlite/pudl.sqlite&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Convert the EPA CEMS data package into an Apache Parquet dataset which can be found at &lt;code&gt;parquet/epacems&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Clone all of the FERC Form 1 annual databases into a single SQLite database which can be found at &lt;code&gt;sqlite/ferc1.sqlite&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Create the PUDL &lt;code&gt;conda&lt;/code&gt; Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This installs the PUDL software locally, and a couple of other useful packages:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;conda create --yes --name pudl --channel conda-forge \
--strict-channel-priority \
python=3.7 catalystcoop.pudl=0.3.2 dask jupyter jupyterlab seaborn pip
conda activate pudl
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Create a PUDL data management workspace&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the PUDL setup script to create a new data management environment inside this directory. After you run this command you&amp;#39;ll see some other directories show up, like &lt;code&gt;parquet&lt;/code&gt;, &lt;code&gt;sqlite&lt;/code&gt;, &lt;code&gt;data&lt;/code&gt; etc.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pudl_setup ./
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Extract and load the FERC Form 1 and EIA 860/923 data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you just want the FERC Form 1 and EIA 860/923 data that has been integrated into PUDL, you only need to download &lt;code&gt;pudl-ferc1.tgz&lt;/code&gt; and &lt;code&gt;pudl-eia860-eia923.tgz&lt;/code&gt;. Then extract them in the same directory where you ran &lt;code&gt;pudl_setup&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tar -xzf pudl-ferc1.tgz
tar -xzf pudl-eia860-eia923.tgz
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To make use of the FERC Form 1 and EIA 860/923 data, you&amp;#39;ll probably want to load them into a local database. The &lt;code&gt;datapkg_to_sqlite&lt;/code&gt; script that comes with PUDL will do that for you:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;datapkg_to_sqlite \
datapkg/pudl-data-release/pudl-ferc1/datapackage.json \
datapkg/pudl-data-release/pudl-eia860-eia923/datapackage.json \
-o datapkg/pudl-data-release/pudl-merged/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now you should be able to connect to the database (~300 MB) which is stored in &lt;code&gt;sqlite/pudl.sqlite&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extract EPA CEMS and convert to Apache Parquet&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to work with the EPA CEMS data, which is much larger, we recommend converting it to an Apache Parquet dataset with the included &lt;code&gt;epacems_to_parquet&lt;/code&gt; script. Then you can read those files into dataframes directly. In Python you can use the &lt;code&gt;pandas.DataFrame.read_parquet()&lt;/code&gt; method. If you need to work with more data than can fit in memory at one time, we recommend using Dask dataframes. Converting the entire dataset from datapackages into Apache Parquet may take an hour or more:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tar -xzf pudl-eia860-eia923-epacems.tgz
epacems_to_parquet datapkg/pudl-data-release/pudl-eia860-eia923-epacems/datapackage.json
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You should find the Parquet dataset (~5 GB) under &lt;code&gt;parquet/epacems&lt;/code&gt;, partitioned by year and state for easier querying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clone the raw FERC Form 1 Databases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to access the entire set of original, raw FERC Form 1 data (of which only a small subset has been cleaned and integrated into PUDL) you can extract the original input data that&amp;#39;s part of the Zenodo archive and run the &lt;code&gt;ferc1_to_sqlite&lt;/code&gt; script using the same settings file that was used to generate the data release:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tar -xzf pudl-input-data.tgz
ferc1_to_sqlite data-release-settings.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#39;ll find the FERC Form 1 database (~820 MB) in &lt;code&gt;sqlite/ferc1.sqlite&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Quality Control&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We have performed basic sanity checks on much but not all of the data compiled in PUDL to ensure that we identify any major issues we might have introduced through our processing prior to release. These checks have also identified some issues in the originally reported data.&lt;/p&gt;

&lt;p&gt;If you have suggestions for additional types of data quality control and validation tests we would love to hear them, or see them in a pull request!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Validation Test Cases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We&amp;#39;ve compiled a collection of data validation test cases which were run against the data in this release prior to publication. For the complete details see the &lt;code&gt;pudl.validate&lt;/code&gt; module and the PyTest routines organized under &lt;code&gt;test/validate&lt;/code&gt; in &lt;a href="https://github.com/catalyst-cooperative/pudl"&gt;the PUDL repository on Github&lt;/a&gt;. Generally these tests include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensuring that there are no entirely NULL columns. This often happens due to a bad merge between dataframes when there&amp;#39;s a misnamed column.&lt;/li&gt;
&lt;li&gt;Make sure that tables have the expected number of records, to +/- a few percent.&lt;/li&gt;
&lt;li&gt;Ensure that tables do not contain duplicate records within specified subsets of columns that should serve as unique keys.&lt;/li&gt;
&lt;li&gt;For reported values that have a physically constrained valid range of values do the vast majority of reported records fall within that valid range? This includes quantities like heat content per unit of fuel delivered/consumed, the sulfur, ash, moisture, chlorine, mercury content of coal, plant capacity.&lt;/li&gt;
&lt;li&gt;Do ownership shares of individual generators reported in EIA 860 sum to 100%?&lt;/li&gt;
&lt;li&gt;Are derived IDs that are used to group units of infrastructure together internally self consistent? For example, are there ever cases where a reported EIA generation unit appears in more than one inferred PUDL generation unit?&lt;/li&gt;
&lt;li&gt;For quantities that may not have a physically constrained range of valid values, do annual slices of the data at least statistically consistent with the historical values reported for that quantity? For example, fuel prices per unit delivered and per unit heat content.&lt;/li&gt;
&lt;li&gt;Do the fractions of different types of fuel consumed by FERC plants add up to 100%?&lt;/li&gt;
&lt;li&gt;Is there a strong correlation between total fuel cost and total heat content of reported fuel consumed for large steam plants in FERC 1?&lt;/li&gt;
&lt;li&gt;Are capacity factors generally between 0 and 1?&lt;/li&gt;
&lt;li&gt;Are plant construction years all after 1850?&lt;/li&gt;
&lt;li&gt;Is the fuel consumed for electricity generation always less than the total fuel consumed?&lt;/li&gt;
&lt;li&gt;Do any inferred generation units contain generators with differing primary fuels?&lt;/li&gt;
&lt;li&gt;Are plants reporting more than 8784 hours connected per year?&lt;/li&gt;
&lt;li&gt;Are coal and gas generator capacity factors within expected ranges?&lt;/li&gt;
&lt;li&gt;Are coal and gas generation unit heat rates within expected ranges?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Known Issues&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is probably not an exhaustive list. If you find something wonky, please bring it up in the &lt;a href="https://github.com/catalyst-cooperative/pudl/issues"&gt;Github issue tracker&lt;/a&gt; so we can keep track of it, fix it, or add it to the documentation at least.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consistency of Harvested Entity Attributes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;EIA 860 reports the same information about utilities, plants, and generators over the years. Many of the reported attributes (like a plant&amp;#39;s latitude and longitude...) should be constant over time. We associate these attributes with the entity ID and store them in one table. However, in some cases the reported values are not perfectly consistent across all the available years of data. When that happens, PUDL chooses the most consistently reported value, so long as at least 70% of the reported values are identical (or very nearly identical in the case of numeric values). If the reported values are too inconsistent, the field is assigned N/A for that entity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incomplete Boiler Generator Associations for Gas Plants&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prior to 2015, EIA did not collect sufficient information to be able to infer complete boiler generator associations (and thus heat rates) for natural gas fired generators. In effect the fuel consumption of combustion turbines and the combustion turbine portions of combined cycle plants were excluded since they aren&amp;#39;t really &amp;quot;boilers.&amp;quot; In the case of combined cycle plants, this can result in impossibly low heat rates, since only additional fuel injected after the combustion turbine counts as fuel input associated with the power generated by the steam turbine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unrealistically High Coal Mercury Content&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In 2012 a significant portion of the coal deliveries reported in the EIA 923 Fuel Receipts and Costs table had mercury content orders of magnitude higher than was possible, and higher than in any other report year. See &lt;a href="https://github.com/catalyst-cooperative/pudl/issues/390"&gt;Github issue 390&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Imperfect FERC Form 1 Plant ID Assignments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because FERC does not assign unique identifiers to the individual plants whose data are reported, and FERC Form 1 respondents are free to identify those facilities however they like from year to year, there is no entirely reliable way to link records pertaining to a given plant in one year to records pertaining to the same plant in another year. We use a record linkage algorithm that considers the reported plant names, capacities, years of construction, primary fuels, and other attributes to attempt to associate plant records with each other across years, but the process is imperfect. The &lt;code&gt;plant_id_ferc1&lt;/code&gt; values found in the &lt;code&gt;plants_steam_ferc1&lt;/code&gt; and &lt;code&gt;fuel_ferc1&lt;/code&gt; tables should be considered experimental and used with caution. See &lt;a href="https://github.com/catalyst-cooperative/pudl/issues/144"&gt;Github issue 144&lt;/a&gt; for more on this endless saga.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-unique Mappings Between &lt;code&gt;plants_steam_ferc1&lt;/code&gt; &amp;amp; &lt;code&gt;fuel_ferc1&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the plant names and utility IDs found in the &lt;code&gt;plants_steam_ferc1&lt;/code&gt; and &lt;code&gt;fuel_ferc1&lt;/code&gt; in any given year for a particular plant are guaranteed to be the same, they are not guaranteed to be &lt;strong&gt;unique&lt;/strong&gt; which means that in a few cases it is not possible to identify exactly how much or what kind of fuel is associated with a particular plant record. This is an issue with the original FERC Form 1 database design which we can&amp;#39;t fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Imperfect Data Entry Error Correction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In some cases obvious errors have been made in data entry or units of measure. We have attempted to fix some of them (e.g. converting heat content reported in BTU per lb of coal into mmbtu per ton) and we are confident that overall these corrections have improved the quality of the dataset, but there are likely a few cases in which they have been applied incorrectly. If you find something off by a factor of 1000, please let us know!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Imperfect Coding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FERC does not restrict the vocabulary respondents may use to describe plant and fuel types, resulting in thousands of different strings being used. We have done our best to identify and categorize them all in the steam plants table, but this process is imperfect.&lt;/p&gt;

&lt;p&gt;Many other tables still have not been similarly coded, the &lt;code&gt;plants_small_ferc1&lt;/code&gt; and &lt;code&gt;purchased_power_ferc1&lt;/code&gt; tables remain especially messy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It&amp;#39;s our intention that a user should be able to completely reproduce the data processing pipeline that we&amp;#39;ve used to generate this data release, and get the same outputs byte-for-byte, using only resources that are available in curated, long-term archives. The main requirements are a copy of the same original source data (archived as part of this data release), and a specification of the software environment, which can be reconstructed with packages from &lt;code&gt;conda-forge&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Original Source Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The original source data as downloaded from the public sources and used by the PUDL software to generate this data release are archived here alongside the outputs in the interest of reproducibility. The publishing agencies do not use version control or provide access to historically published versions, meaning that the same data may not remain available from them going forward. All of the original input data can be found in the &lt;code&gt;pudl-input-data.tgz&lt;/code&gt; compressed archive distributed with this data release. The data it contains were downloaded from FERC, EIA, and EPA between January 31st and February 17th, 2020. A small amount of additional data that we have compiled by hand is distributed as part of the Python package.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This data release was generated using v0.3.2 of the &lt;code&gt;catalystcoop.pudl&lt;/code&gt; Python package, which is available on the official Python Package Index as well as via &lt;code&gt;conda&lt;/code&gt; using the community maintained &lt;code&gt;conda-forge&lt;/code&gt; channel. It&amp;#39;s also archived in &lt;a href="https://github.com/catalyst-cooperative/pudl/releases/tag/v0.3.2"&gt;the PUDL Github repository&lt;/a&gt;. and &lt;a href="https://doi.org/10.5281/zenodo.3671600"&gt;on Zenodo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;archived-environment.yml&lt;/code&gt; file distributed in this archive describes the &lt;code&gt;conda&lt;/code&gt; software environment in which this data release was generated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OS / Hardware&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The data release was generated on a 64 bit, Intel based Ubuntu Linux 19.10 system. The only specialized external library that was required outside of the &lt;code&gt;conda&lt;/code&gt; framework was &lt;code&gt;libsnappy-dev&lt;/code&gt; version &lt;code&gt;1.1.7-1&lt;/code&gt;. Note that this library should not be required if you use &lt;code&gt;conda&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The data processing pipeline used to generate this data rel</description>
</descriptions>
</resource>

1,995
394
views