Published March 14, 2026 | Version 5.0.0
Dataset Open

Social Data Commons: Walkability (v5.0.0)

Authors/Creators

  • 1. University of Virginia

Description

Overview

Multi-year Walkability Index derived from LEHD-LODES employment entropy, GTFS transit proximity, and EPA Smart Location Database street connectivity. Each block group is scored using the formula NatWalkInd = D2A_Ranked/6 + D2B_Ranked/6 + D3B_Ranked/3 + D4C_Ranked/3, where D2A and D2B are employment/household land use entropy from LODES WAC job counts and ACS household data, D3B is street intersection density from the EPA SLD, and D4C is distance to the nearest transit stop from GTFS feeds. Components are ranked into 20 quantile bins before combining. Block group scores are aggregated to Census tracts and counties using population-weighted means. This dataset is produced by the Social Data Commons at the University of Virginia as part of the Walkability Index data pipeline.

Provenance

Derived from the EPA National Walkability Index methodology. Employment entropy (D2A/D2B) computed annually from LEHD LODES + ACS data. Street connectivity (D3B) from EPA SLD V3. Transit proximity (D4C) computed annually from national GTFS feeds. Formula: NatWalkInd = D2A_Ranked/6 + D2B_Ranked/6 + D3B_Ranked/3 + D4C_Ranked/3.

Coverage

  • Temporal coverage: 2017–2023 (annual, LODES + GTFS)
  • Geographic levels: County, Health District, Tract
  • Coverage areas: National Capital Region (DC metro), United States (national), Virginia (statewide)

Methodology

The Walkability Index is a composite measure that ranks block groups according to their relative walkability, computed annually for 2017-2023. It combines four components: (1) D2A_Ranked — employment and household land use entropy from LEHD LODES WAC job counts by NAICS sector and ACS B11001 household counts, measuring the diversity of nearby land uses; (2) D2B_Ranked — employment-only entropy across five tiers (retail, office, industrial, service, entertainment); (3) D3B_Ranked — street intersection density from the EPA Smart Location Database V3, measuring pedestrian-friendly street connectivity; and (4) D4C_Ranked — distance to the nearest transit stop computed from national GTFS feeds via the Mobility Database, measuring transit accessibility. Each component is ranked into 20 quantile bins across all block groups in the coverage area. The final index is calculated as NatWalkInd = D2A_Ranked/6 + D2B_Ranked/6 + D3B_Ranked/3 + D4C_Ranked/3, producing scores from 1 (least walkable) to 20 (most walkable). Block group scores are aggregated to Census tracts and counties using population-weighted means.

Source Tables

Variables

  • Custom NatWalkInd = D2A_Ranked/6 + D2B_Ranked/6 + D3B_Ranked/3 + D4C_Ranked/3: Walkability Index

Measures (1)

Note on naming conventions: Measures containing _geo20 are computed using 2020 Census geographic boundaries.

  • walkability_index_geo20: Walkability Index (population-weighted mean, unit: index score) Composite walkability score (1-20) combining employment entropy, street connectivity, and transit proximity, updated annually from LODES, EPA SLD, and GTFS data.

Data Sources

File Format

Data files are provided as CSVs (.csv) with the following columns: geoid, region_type, region_name, year, measure, value, moe (margin of error, where available). Larger files are provided as xz-compressed CSVs (.csv.xz).

Files

ncr_cttr_bi_2017_2023_walkability_index.csv

Files (178.2 MB)

Name Size Download all
md5:eb0344413470051e499108f5dcd6912f
1.1 MB Preview Download
md5:01511f244f1653761b56f641edaf7f87
175.2 MB Preview Download
md5:4643571a315ebefd546f317d83e14d05
1.9 MB Preview Download

Additional details

Related works

Is supplemented by
https://github.com/dads2busy/sdc (URL)

Dates

Collected
2017-01-01/2023-12-31
Data coverage period