Board Leadership Database (U.S. Public Firms) + ML Script for Scaling Human Coded Data
- 1. Texas Christian University
- 2. Indiana University
- 3. Tilburg University
Description
Files include: (1) an open sourced database of CEO duality and board chair orientations developed by scaling human coded data using supervised machine learning techniques (in both .dta and .csv formats), as well as (2) the accompanying training and scoring scripts to scale human coded data.
Users may apply the scoring script to score the same variables from company proxy statements, or may adapt the training/scoring scripts and retrain models to scale human coded data of other constructs or measures.
We note that early steps in the process to develop our database and script required web-scraping of company filings from SEC Edgar and text extraction from collected filings. We relied on other publicly available scripts to develop our own fetcher and extraction scripts. Users seeking to duplicate those parts of the process may benefit from the following resources from Kai Chen and pipy.org:
For resources from Kai Chen: see https://www.kaichen.work/?p=681 and https://www.kaichen.work/?p=946
For resources from pipy.org, see sec-edgar-downloader and sec-api
Files
board-leadership-dataset_2022-11-08.csv
Files
(102.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:582ef12c0451d170d607ab91756b1fed
|
43.1 kB | Download |
|
md5:495d6bf001d2b2146609e4dffd654be8
|
47.1 MB | Preview Download |
|
md5:b52b7a2d8eb4c70a09bfc62e28a48562
|
47.3 MB | Download |
|
md5:9a917fabefbcb2a4cd712a69257f27bf
|
7.3 kB | Download |
|
md5:e4282b4453fab05893e4d5cc31dc4dfe
|
3.3 MB | Preview Download |
|
md5:a7ba5b1ce7e8dc73f07347c937bcddd7
|
4.2 MB | Preview Download |
|
md5:94244e040ca33a978437936c91fc4b7a
|
14.2 kB | Download |
|
md5:11ed512f11d87b4d9db399b9873a4359
|
18.4 kB | Download |
|
md5:537959017cae25c69049b1c9375a0674
|
15.3 kB | Download |