There is a newer version of the record available.

Published December 7, 2022 | Version 1
Dataset Open

Board Leadership Database (U.S. Public Firms) + ML Script for Scaling Human Coded Data

  • 1. Texas Christian University
  • 2. Indiana University
  • 3. Tilburg University

Description

Files include: (1) an open sourced database of CEO duality and board chair orientations developed by scaling human coded data using supervised machine learning techniques (in both .dta and .csv formats), as well as (2) the accompanying training and scoring scripts to scale human coded data.

Users may apply the scoring script to score the same variables from company proxy statements, or may adapt the training/scoring scripts and retrain models to scale human coded data of other constructs or measures. 

We note that early steps in the process to develop our database and script required web-scraping of company filings from SEC Edgar and text extraction from collected filings. We relied on other publicly available scripts to develop our own fetcher and extraction scripts. Users seeking to duplicate those parts of the process may benefit from the following resources from Kai Chen and pipy.org: 

For resources from Kai Chen: see https://www.kaichen.work/?p=681 and https://www.kaichen.work/?p=946

For resources from pipy.org, see sec-edgar-downloader and sec-api

 

Files

board-leadership-dataset_2022-11-08.csv

Files (102.1 MB)

Name Size Download all
md5:582ef12c0451d170d607ab91756b1fed
43.1 kB Download
md5:495d6bf001d2b2146609e4dffd654be8
47.1 MB Preview Download
md5:b52b7a2d8eb4c70a09bfc62e28a48562
47.3 MB Download
md5:9a917fabefbcb2a4cd712a69257f27bf
7.3 kB Download
md5:e4282b4453fab05893e4d5cc31dc4dfe
3.3 MB Preview Download
md5:a7ba5b1ce7e8dc73f07347c937bcddd7
4.2 MB Preview Download
md5:94244e040ca33a978437936c91fc4b7a
14.2 kB Download
md5:11ed512f11d87b4d9db399b9873a4359
18.4 kB Download
md5:537959017cae25c69049b1c9375a0674
15.3 kB Download