Replication package for Liu, Jessie (Zhenqi). 2025. “How Allowing a Little Bit of Dissent Helps Control Social Media: Impact of Market Structure on Censorship Compliance”. Journal of the European Economic Association

## Replication package structure:

1. data (contains all raw data required for replication, all intermediate datasets saved here)
2. code (contains all code required for replication)
3. output (empty folder, all figures and tables saved here)

## Replication Instructions

To replicate the results, follow one of the two methods outlined below:

### Automated Replication
This package includes a bash script (`run_all.sh`) to automate the replication process.

1. Navigate to the root directory in your terminal.
2. Run the bash script:
   ```
   ./run_all.sh
   ```
3. Check the `output` folder for all tables and figures except the ones not based on data analysis, listed below:

Figure 1: screenshot 
Table 1: example categories & keywords 
Figure 3: Latex (TikZ) figure 
Figure 6: Latex (TikZ) figure 
Appendix A, Figure 8: screenshot
Appendix I, Table 14 is based on the sources provided in Table 15 & 16

### Manual Replication
Set your root directory by changing and uncommenting the line:
```
global dir "CHANGE TO YOUR OWN PATH"
```
at the beginning of each do-file before executing them. To produce results manually, run the following scripts in the exact order:

Heatmap.py
Datasets.do 
EventStudy.do  
Estimation.m 
MonteCarlo.m  
AppendixB.m 


**********************************************************************************************
List of files:

1. data:

Raw data included in folder:

SVP_keywords.dta - data on blacklisted keywords.
Knockel, J., Crete-Nishihata, M., Ng, J. Q., Senft, A., & Crandall, J. R. (2015). Every rose has its thorn: Censorship and surveillance on social video platforms in China. In 5th USENIX Workshop on Free and Open Communications on the Internet (FOCI 15). USENIX Association. Retrieved from GitHub: https://github.com/citizenlab/chat-censorship/tree/master/livestream.

SVP_siteRankData.dta - data on daily Alexa rank. Sourced from:
Sourced from: Alexa Rank. (2017). Alexa Rank Data. Data accessed on September 30, 2017. The original URL is no longer retrievable.

SVP_rank2traffic.dta - data on bi-monthly traffic. 
Sourced from: Rank2Traffic. (2017). Rank2Traffic Data. Data accessed on September 30, 2017. The original URL is no longer retrievable.

SVP_eventdate.dta - a list of 30 unexpected event name and date (details in Appendix I)

AppAnnie.dta - data on daily download rank of Sina Show. 
Sourced from: App Annie. (2016). App Annie Store Stats. Data accessed on January 30, 2017. The original URL is no longer retrievable.

QuestMobile.dta - data on Daily Active Users of YY. 
Sourced from: QuestMobile. (2024). Daily Active User Data (YY). Data accessed on June 11, 2024. To obtain access for purchase, please contact QuestMobile (Beijing Guishi Information Technology Limited).

Datasets constructed using Datasets.do:

wordcount.dta - event-firm panel of daily keyword counts  
eventstudy.dta - event-week-firm panel for event study analysis 
estimation.csv - event-firm panel of (binary) firm action and traffic for structural estimation

2. code:

Executable scripts for generating outputs:  

Heatmap.py - create keyword heatmaps (Figure 2)
Datasets.do - construct intermediate datasets for analysis 
EventStudy.do - create summary statistics and perform event study analysis 
Estimation.m - generate estimation results 
MonteCarlo.m - generate Monte Carlo results 
AppendixB.m - create Figures 9-11 

Other scripts included in the folder:
K3.m
MMSEobjFunc.m
NestedMLE.m
NestedMLE_JO.m
TwoStepMMSE.m
logLikelihood.m
logLikelihood_JO.m
merger.m
merger_JO.m
objFunc.m
objFunc_JO.m
parfor_progress.m
shutdown.m
shutdown_JO.m
simulateData.m
solveEq.m
solveEqNoInter.m
solveEq_JO.m
uniEqConstraint.m
uniEqConstraint_JO.m


3. Output
This folder is initially empty and will store all figures and tables generated during replication.

**********************************************************************************************

Software requirements 

Stata/SE 18.0 
Stata packages: 
estout
coefplot
runtest
labmask
reghdfe

Python 3.9.6
Python packages:
pandas
seaborn
matplotlib
scikit-learn
unidecode

MATLAB (R2022b)
"Global Optimization Toolbox" and "Parallel Computing Toolbox" are required.  
The MATLAB scripts utilize parallel computing with the parfor function to optimize performance by distributing computations across multiple cores. For instance, on a 12-core MacOS machine, the estimated run times are approximately:
- 10 minutes for "Estimation.m"
- 180 minutes for "MonteCarlo.m"



