Journal article Open Access

Stacked Ensemble Learning for Propensity Score Methods in Observational Studies

Autenrieth, Maximilian; Levine, Richard A.; Fan, Juanjuan; Guarcello, Maureen A.

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.5048425</identifier>
      <creatorName>Autenrieth, Maximilian</creatorName>
      <affiliation>Imperial College London</affiliation>
      <creatorName>Levine, Richard A.</creatorName>
      <givenName>Richard A.</givenName>
      <affiliation>San Diego State University</affiliation>
      <creatorName>Fan, Juanjuan</creatorName>
      <affiliation>San Diego State University</affiliation>
      <creatorName>Guarcello, Maureen A.</creatorName>
      <givenName>Maureen A.</givenName>
      <affiliation>San Diego State University</affiliation>
    <title>Stacked Ensemble Learning for Propensity Score Methods in Observational Studies</title>
    <subject>educational data mining</subject>
    <subject>machine learning</subject>
    <subject>ensemble learning</subject>
    <subject>stacked generalization</subject>
    <subject>propensity score estimation</subject>
    <subject>causal inference</subject>
    <date dateType="Issued">2021-06-30</date>
  <resourceType resourceTypeGeneral="JournalArticle"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsCitedBy"></relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.5048424</relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution Non Commercial No Derivatives 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model - which we call GBM-Stack - to assess the population average treatment effect of a Supplemental Instruction (SI) program in an introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20).</description>
All versions This version
Views 9494
Downloads 6161
Data volume 69.2 MB69.2 MB
Unique views 9292
Unique downloads 5959


Cite as