YAML / scoping-doc / runtime ordering: consistent (['unspecified_zero', 'estimated', 'imputed', 'reported']).
Stage 4 — era-A direct rows (1975-2009)...
  Stage 4 rows: 680,031  (14.91s)
Stage 5 — era-B per-component rows (2010-2024)...
  Stage 5 rows: 533,268  (46.54s)
Stage 6 — era-B all_source reconstruction...
  Stage 6 rows: 248,795  (0.37s)
Stage 7 — schema assembly (UNION ALL)...
  Assembled panel rows: 1,462,094 (= 680,031 + 533,268 + 248,795)  (0.23s)
Stage 9 — sanity assertions (SQL-level, pre-write)...
  All Stage 9 assertions passed.  (0.28s)
Stage 8 — attribute table (Q4/Q5 pivot)...
  Attribute table rows: 34,362  (19.35s)
Stage 10 — parquet write...
  Wrote data\harmonized\herd_panel.parquet (7,206,884 bytes)
  Wrote data\harmonized\herd_panel_attributes.parquet (151,158 bytes)  (0.57s)

==============================================================================
Sanity report — herd_panel.parquet + herd_panel_attributes.parquet
==============================================================================

Panel parquet: herd_panel.parquet
  Size: 7,206,884 bytes (6.87 MB)
Attribute parquet: herd_panel_attributes.parquet
  Size: 151,158 bytes (0.14 MB)

Panel total rows: 1,462,094

--- Panel schema (column / type) ---
  institution_id                VARCHAR
  fice                          VARCHAR
  ncses_inst_id                 VARCHAR
  ipeds_unitid                  VARCHAR
  inst_name_long                VARCHAR
  year                          INTEGER
  era                           VARCHAR
  discipline_coarse             VARCHAR
  discipline_fine               VARCHAR
  expenditure_type              VARCHAR
  source_class                  VARCHAR
  form_type                     VARCHAR
  value                         DOUBLE
  unit                          VARCHAR
  value_type                    VARCHAR
  quality_flag                  VARCHAR
  source_questionnaire_no       VARCHAR
  source_question_canonical     VARCHAR
  source_question_raw           VARCHAR
  source_file                   VARCHAR
  notes                         VARCHAR

--- Row counts by (era, source_class, expenditure_type, form_type) ---
  era  source_class  expenditure_type  form_type   n
  A    all_source    r&d               standard      380,358
  A    all_source    r&d_equipment     standard      299,673
  B    all_source    r&d               short          18,481
  B    all_source    r&d               standard      248,795
  B    all_source    r&d_equipment     standard      104,551
  B    federal       r&d               standard      178,696
  B    nonfederal    r&d               standard      231,540

--- Distinct discipline_coarse buckets ---
  'All'                             n=   80,713
  'Engineering'                     n=  270,883
  'Geosciences'                     n=  181,031
  'Life sciences'                   n=  239,548
  'Math & CS'                       n=   88,204
  'Non-S&E'                         n=  131,032
  'Other sciences nec'              n=   29,904
  'Physical sciences'               n=  204,458
  'Psychology'                      n=   45,204
  'Social sciences'                 n=  191,117

--- quality_flag distribution by era ---
  era=A  'reported'              n=  390,307  (26.70%)
  era=A  'imputed'               n=  260,067  (17.79%)
  era=A  'estimated'             n=   29,657  ( 2.03%)
  era=B  'reported'              n=  749,665  (51.27%)
  era=B  'imputed'               n=   32,389  ( 2.22%)
  era=B  'unspecified_zero'      n=        9  ( 0.00%)

--- Free-sum institution-year-total R&D ($M) by year × era ---
  year=1975  era=A  total=$   3,408.7M
  year=1976  era=A  total=$   3,729.0M
  year=1977  era=A  total=$   4,067.0M
  year=1978  era=A  total=$   4,624.7M
  year=1979  era=A  total=$   5,366.1M
  year=1980  era=A  total=$   6,062.8M
  year=1981  era=A  total=$   6,846.9M
  year=1982  era=A  total=$   7,323.7M
  year=1983  era=A  total=$   7,881.8M
  year=1984  era=A  total=$   8,620.4M
  year=1985  era=A  total=$   9,687.1M
  year=1986  era=A  total=$  10,927.7M
  year=1987  era=A  total=$  12,152.7M
  year=1988  era=A  total=$  13,462.9M
  year=1989  era=A  total=$  14,979.2M
  year=1990  era=A  total=$  16,289.5M
  year=1991  era=A  total=$  17,589.4M
  year=1992  era=A  total=$  18,820.7M
  year=1993  era=A  total=$  19,954.1M
  year=1994  era=A  total=$  21,034.9M
  year=1995  era=A  total=$  22,179.0M
  year=1996  era=A  total=$  23,055.1M
  year=1997  era=A  total=$  24,380.0M
  year=1998  era=A  total=$  25,867.2M
  year=1999  era=A  total=$  27,543.9M
  year=2000  era=A  total=$  30,084.1M
  year=2001  era=A  total=$  32,800.5M
  year=2002  era=A  total=$  36,382.6M
  year=2003  era=A  total=$  40,077.4M
  year=2004  era=A  total=$  43,237.8M
  year=2005  era=A  total=$  45,773.9M
  year=2006  era=A  total=$  47,758.7M
  year=2007  era=A  total=$  49,494.9M
  year=2008  era=A  total=$  51,871.8M
  year=2009  era=A  total=$  54,863.0M
  year=2010  era=B  total=$  61,286.6M
  year=2011  era=B  total=$  65,274.4M
  year=2012  era=B  total=$  65,729.0M
  year=2013  era=B  total=$  66,977.6M
  year=2014  era=B  total=$  67,161.4M
  year=2015  era=B  total=$  68,520.0M
  year=2016  era=B  total=$  71,737.2M
  year=2017  era=B  total=$  75,149.4M
  year=2018  era=B  total=$  79,026.0M
  year=2019  era=B  total=$  83,490.4M
  year=2020  era=B  total=$  86,306.1M
  year=2021  era=B  total=$  89,700.9M
  year=2022  era=B  total=$  97,669.6M
  year=2023  era=B  total=$ 108,681.0M
  year=2024  era=B  total=$ 117,554.1M

--- Era-B reconstruction identity check (from disk) ---
  N both-present cells (full panel)      :   215,894
  Min absolute residual                   : 0.0
  Median absolute residual                : 0.0
  Max absolute residual                   : 0.0
  PASS: median = max = 0 (exact arithmetic survives roundtrip).

--- Attribute table summary ---
  Total rows: 34,362
  era=A  n_rows=20,961  with_med_school=    0  with_clinical_trials=    0  with_med_share=    0  with_clinical_share=    0
  era=B  n_rows=13,401  with_med_school=2,307  with_clinical_trials=2,693  with_med_share=2,307  with_clinical_share=2,693

Total HD 2.4 build wall time: 82.57s
