********************************************************************************************
***DISCERN/ ASHISH ARORA, SHARON BELENZON, LIA SHEER (DUKE UNIVERSITY) / DECEMBER 2020***
********************************************************************************************
********************************************************************************************
*This is the main do file. It connects to all the other do files in the "programs" folder
*Users should read the Online Data Appendix before using the data
*Due to IP restrictions, not all programs can be run by users; in such cases, the code is available for reference
*Light run: ran using StataMP15 (64-bit) on a personal laptop using Windows 10-Pro, RAM:24GB
clear all
set more off
cd "/*enter your directory here*/"

*1) Compiling Compustat financial data:
*based on North American Compustat records obtained through WRDS in August 2018
*users should obtain the North American Compustat data file before running "compustat_do.do" 

do "./programs/compustat_do.do"

*2) Compiling patent data: 
*OUTPUT: flow+stock variables including dynamic reassignment of patents : "data/pat_per_year_permno_adj.dta"  and "data/pat_stock_permno_adj.dta"
*the main patent dataset is also compiled here but is finalized in step 5 below

do "./programs/patent_do.do"

*3) Compiling publication data: 
*OUTPUT: flow+stock variables including dynamic reassignment of publications 
*do file is only for reference and cannot be run; publication data are only available at the aggregate permno_adj-year level due to IP restrictions:
*"data/pub_per_year_permno_adj.dta" and "data/pub_stock_permno_adj.dta"

do "./programs/pub_do.do"

*4) Compiling NPL citation data: 
*OUTPUT: NPL citations received per permno_adj-patent grant year and classification to internal vs. corporate external citations
*do file is only for reference and cannot be run; NPL data are only available at the aggregate permno_adj-year level due to IP restrictions:
*"data/corp_NPL_cite_per_year_firm_80_15.dta"

do "./programs/npl_do.do"

*5) Compiling accounting data panel file: 
*OUTPUT: panel file- "./output_files/DISCERN_Panel_Data_1980_2015.dta"; 
*"permno_adj_long" is the unique firm id
*the last part of this program also finalizes the main patent dataset: "./output_files/DISCERN_patent_database_1980_2015_final1"

do "./programs/panel_do.do"


*EXTRA PROGRAMS:
*The file “programs/NPL_cleaning_exp.do” provides sample code for cleaning NPL citations.
*The file “programs/NAME_STD.do” provides sample code for standardizing the name lists.
