The Economics of Research Reproducibility

Empirical evidence documents a relatively low level of research reproducibility in economics. In this paper, we investigate why this is the case and what can be done to move out of this low-reproducibility equilibrium. We study the supply and demand for research reproducibility, provide empirical evidence on authors' preferences for reproducibility, and estimate the cost of verifying reproducibility. We theoretically show that competition between journals to attract authors leads to a suboptimally low level of reproducibility. Leading journals with sufficient market power can set higher reproducibility standards, which is consistent with recent changes in data availability policies.


Introduction
Like many scientific disciplines, economics has experienced a data and computational revolution.
Today, most academic papers are empirical and rely on complex scripts analysing rich and sizable datasets. As computational results now account for a large part of the output and contribution of an academic paper, it is more important than ever that empirical results are reproducible, that is, one can check whether "same data + same code = same results" (Buckheit and Donoho (1995); Barba (2018)). 1 However, in contrast to the swift evolution of research practices and the growing importance of reproducibility in science, until recently economics journals have evolved slowly on this front.
In this paper, we aim to rationalize, and potentially improve, this situation by conducting a comprehensive study of the economics of research reproducibility. Our study first encapsulates the existing literature in a simple theoretical framework that clarifies the economic determinants of the current level of reproducibility in economic journals. We then fill important gaps in the literature. We provide evidence on authors' preferences for reproducibility and estimates of the costs of reproducing empirical results based on our experience at cascad, a reproducibility verification agency. 2 We also show that in our theoretical framework competition between journals leads to a suboptimally low level of reproducibility, thus calling for remedial actions.
To understand the economics of research reproducibility, it is useful to consider that the path towards publishing reproducible follows three key stages. Stage 1 is the default situation of no reproducibility policy. Stage 2, or unverified reproducibility, is reached when the journal introduces its first Data Availability Policy (DAP). Stage 3, or verified reproducibility, requires conducting a 1 In contrast to reproducibility, replicability refers to the ability of a researcher to duplicate the results by implementing the same methodology in another context or time period ("same code + new data = same results") or a different methodology to the same data ("new code + same data = same results") (Peng et al. (2006)).
2 cascad stands for Certification Agency for Scientific Code And Data and its website is www.cascad.tech. It is a non-profit scientific organization promoting and verifying research reproducibility in economics and business. It is funded by the French National Center for Scientific Research (CNRS) along with several research institutions and universities. results has remained surprisingly low in economics (McCullough et al. (2008); Chang and Li (2017); Gertler et al. (2018)). Economic journals' DAP have been quite vague and only partly enforced in practice (Duvendack et al. (2017)). The result is that for many papers numerical resources are unavailable, improperly documented, or of insufficient quality, so that well-trained economists are often unable to reproduce the results of papers published in leading economics journals, even when code and data are publicly available.
The solution to this problem is to move to Stage 3, verified reproducibility. The verification process takes place before the final acceptance of a manuscript and it follows two steps. The first step consists in checking whether the authors have complied with the guidelines aiming to ease the duplication of the findings. These guidelines cover the presentation and structure of code and data and aim to make these resources findable, interpretable, and reusable. 5 The second step aims at ensuring that the numerical results (tables and figures) included in the scientific article correspond to the numerical results generated by the computer code and data of the authors. The two-step verification is conducted by a reproducibility referee under the supervision of a data editor or reproducibility editor. The tasks of the reproducibility referee are to check that both the code and data comply with the guidelines, to execute the code, to compare the output with the results presented in the tables and figures of the article, and to list any potential discrepancy.
To date, only a handful of economics journals have moved to Stage 3 reproducibility, a significant exception being the AEA Journals. We argue in this paper that this situation is best understood as an equilibrium phenomenon. We propose a conceptual framework for the demand and supply of research reproducibility. Following the existing literature (e.g., Jeon and Rochet (2010), we model journals as platforms intermediating between authors (the suppliers of reproducible research) and readers (the consumers). An explanation often given for the low level of reproducibility is the high cost reproducibility imposes on authors. However, to our knowledge the only evidence on these costs is based on surveys. We analyze the propensity of authors of papers accepted in the Journal of Financial Economics to shed light on these costs. In particular, we do not find any evidence that authors are more reluctant to share their code when they are more senior, more cited, or affiliated with more prestigious universities. This evidence does not support the received idea that journals should not increase their reproducibility standards as this would discourage the best authors from submitting their work (see, e.g., Harvey (2014)).
We then use our theoretical framework to study the competition between two academic journals that choose submission fees, subscription fees, and a reproducibility level. We show that in equilibrium this competition leads to a suboptimally low level of reproducibility. The reason is that journals are competing fiercely to attract authors who form a "competitive bottleneck" (Armstrong, 2006), and lowering reproducibility is a way to lure authors from competing journals. Hence, we argue that the low level of reproducibility observed in most economic journals cannot be assumed to optimally balance the supply of and demand for reproducibility.
Finally, we discuss different paths to be explored to move out of a low reproducibility equilibrium: imposition of new "reproducibility standards" by journals with sufficient market power, appointment of data editors, recourse to third party verification services, and lowering the reproducibility costs for journals. On the last point, we use our experience at cascad to provide concrete estimates of the costs of verifying the reproducibility of empirical research in economics. We show that these costs exhibit very significant economies of scale, due in particular to the large costs for the verification team to accessing new commercial or administrative datasets. In particular, we estimate that significantly exploiting these economies of scale would lead to an average cost around USD 350 per paper. We conclude from this exercise that reaching Stage 3 reproducibility is achievable in economics but will require optimizing the verification process.

Benefits and Costs of Reproducibility
In this section, we present a simple conceptual framework to think about the demand and the supply of reproducibility in academic research. We then discuss the costs and also the potential benefits of reproducibility for authors and for readers.

Setup
In order to apprehend the economics of reproducibility, we rely on the literature conceptualizing academic journals as platforms intermediating between authors and readers, in particular Jeon and Rochet (2010). For the moment, we consider the case of a single journal. A journal publishing n A articles chooses a level of reproducibility q ∈ R + . The three stages of reproducibility defined above can be thought of as three discrete values of q, but we allow for a continuum between those stages. The journal charges a fee p A to each author upon submission and a fee p R to each reader for subscribing to the journal. As in Jeon and Rochet (2010), we can have p A ≤ 0 (authors being paid to publish) but not p R < 0. 6 There is a continuum [0, 1] of articles indexed by i, each article is characterized by a vector of characteristics X i (e.g., topic, quality, authors, etc.). We abstract away from the issue of selecting papers through the work of referees and editors, and assume that all articles in [0, 1] are good enough to pass the journal's screening process. 7 There is a continuum [0, +∞) of potential readers, indexed by j. Each reader is characterized by some characteristics Y j (e.g., different taste for different types of articles). Each reader chooses whether to subscribe to the journal. 6 A journal may want to charge negative subscription fees to attract more readers and hence citations. However, the journal cannot control that a reader who gets the subsidy indeed reads the journal, so that negative subscription fees would lead to having many "fake" readers. 7 The growing theoretical literature on academic journals has focused on screening, see McCabe and Snyder (2005), Jeon and Rochet (2010), Wang (2018), and Gehrig and Stenbacka (2021). While screening and reproducibility can both be seen as a quality variable, a crucial difference is that a high level of reproducibility imposes a cost on authors and not only on the journal.
The author of a published article receives a payoff that depends on the readers of the journal and their characteristics, the reproducibility level q, the article's characteristics X i , and the submission fee paid p A . Namely, if S ⊂ R + is the set of readers who subscribe to the journal, we assume the payoff of article i's authors to be: For the moment, we simply assume that u A is always positive: all else equal, authors prefer to be published in journals with a larger number of readers. We discuss in Section 1.3 below how u A could also depend on q and X i , and how this affects the economics of reproducibility. C A is the cost of reaching reproducibility level q for the authors of article i, independently of who reads the journal.
A reader who subscribes to a journal receives a payoff that depends on the number and characteristics of the published papers, the reproducibility level q, and the subscription fees p R . Namely, denoting P ⊂ [0, 1] the set of published articles, we assume this payoff to be: We assume that u R is always positive and increasing in q: all else equal, readers prefer journals with more articles and a higher level of reproducibility.
The journal serves as a platform for authors and readers, and attracts an endogenous number n A of articles and n R of readers. Reaching reproducibility level q has a cost C J (q, n A ) for the journal, with C J 1 ≥ 0. We will assume throughout the paper that the journal aims at maximizing citations, 8 and that citations are proportional to the number of readers n R . The journal faces a break-even 8 See Card and DellaVigna (2020) for evidence on editors' objectives. constraint: In this framework, the readers express a demand for reproducibility, which is costly to journals and to authors. The prices p A and p R can be used to compensate the authors and the journal for these costs. The equilibrium level of reproducibility will then depend on the balance between supply and demand, and hence on the utility functions u A and u R .

Reproducibility for Readers
To better understand the demand for reproducibility, we discuss why the readers of academic  (2019) show that a mandatory data-disclosure policy has a positive effect on the replication probability by six percentage points. They conclude that replication efforts could be incentivized by promoting data disclosure and hence reducing the cost of replication.
Second, reproducibility serves as a control that the results reported in an article come from the methodology described. In their survey of transparency and reproducibility in economics research, Christensen and Miguel (2018) claim that this basic standard should be expected of all published economics research, as it is the first step toward a more thorough assessment of the validity of a scientific claim. In particular, reproducibility allows to conduct an in-depth analysis of the code and data, which allows one to spot coding errors (see Herndon et al. (2013) on Reinhart and Rogoff (2010)), cases of specification searching and p-hacking (Harvey (Forthcoming); Christensen et al.
(2019); Brodeur et al. (2020)), or possible cases of data fraud (Simonsohn (2013)). Moreover, given the high reputation cost for a researcher whose publication is found to be erroneous, requiring data and code to be publicly available should encourage researchers to exert more effort in detecting such errors in the first place.
Third, reproducibility can stimulate further studies by communicating more information to the academic community on how exactly to conduct a given analysis. This generates economies of scale as different authors working on the same data do not have to repeat time-consuming procedures, for instance those necessary to clean up the data. 9

Theory
While unambiguously desirable from the perspective of the readers, reproducibility imposes costs on authors. In a survey, Stodden (2010) reports that almost half of the respondents state that the lack of incentives and direct benefits is an important reason for researchers not to make their computer code publicly available. We briefly survey different costs, but also benefits, for authors, and how they relate to our framework.
The first and perhaps main cost is the opportunity cost of the time spent cleaning data, documenting code, and providing technical support to other researchers using the shared material (Harvey, Forthcoming). The pressure to publish is indeed ranked first among all the reasons put forward by scientists when surveyed by Nature about the impediments to reproducible research (Baker (2016)). These costs are independent of the journal's readership and are captured by C A (q, X i ).
9 Examples of popular datasets and code include the Fama and French portfolios used in empirical finance (https: //mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and the GARCH toolbox provided by Kevin Sheppard (https://www.kevinsheppard.com/code/matlab/ucsd-garch/). A second cost is reputational risk. Making one's code and data available makes it easier for others to spot errors, which is socially beneficial but privately costly. This cost could depend on the journal's readership and enter u A (q, Y j , X i ).
A third cost is that sharing one's code and data will make the authors face more competition when writing follow-up papers. These costs depend again on an article's characteristics X i . Some types of research may be particularly discouraged by a tough reproducibility policy, in particular research using proprietary datasets (Harvey, 2014). The costs could also be larger for more productive authors and/or more innovative papers. 10 Reproducibility may also have benefits for researchers. A first benefit is that journals with a stricter reproducibility policy may attract more readers, and hence publishing in such a journal may attract more citations: in our framework, journals with a higher q may endogenously have a higher n R and this benefits authors. 11 A second benefit is that a high level of reproducibility may signal the high quality of a paper and its authors to the relevant audience -in the present case, peers, universities, and research funding agencies. Lerner and Tirole (2002) argue that this type of signaling motive is an important driver of the open-source software development community. Theoretically, C A (q, X i ) could thus decrease in q, at least for certain paper characteristics X i .

Empirics
We are not aware of empirical evidence other than surveys on the benefits for authors of making their research reproducible. To fill this gap, we analyze authors' decisions to voluntarily make their paper reproducible, and how this decision correlates with various authors' characteristics. Our 10 The Journal of Financial Economics recently implemented a new policy to mitigate this concern: authors have to disclose code and data but can optionally choose to keep both hidden for up to two years after publication (Whited, 2021). See also Hill and Stein (2020) for empirical evidence that the race to publish first can lead to lower quality research.  between January 2010 and September 2020, which corresponds to all issues between the first one of volume 95 and the third one of volume 137. 12 We end up with a total number of 1,347 papers written by 2,231 authors.
We focus on the JFE for two reasons. First, over the past decade, this journal has encouraged authors to share code and data associated with their papers but never made it mandatory. Hence, by studying the decision of whether to share numerical resources, we can identify researchers' characteristics that act as an impetus or impediment to research reproducibility. Second, by focusing on a single journal, we neutralize the effect of the reputation of a journal on an author's decision to make a given paper reproducible. That is, different decisions can only come from the articles' We collected information about available code and data from the JFE data and program webpage. 13 Over our sample period, 67 published papers or 4.97% of all published papers are open, i.e., having code or data, or both, available for download from the JFE website. 14 This very low percentage is surprising given the broad definition we use to identify open papers but it is in line with previous evidence showing that sharing code and data is not widespread in economics (see Section 2.1). In order to mitigate any imbalanced dataset problem that may arise when studying rare events, we also consider a subsample only including papers published in monthly issues including at least one paper with code and/or data available. In this subsample, there are 544 papers and the fraction of open papers is 12.13%.
12 With a 5.731 impact factor, JFE is ranked in the top-3 in finance and in the top-10 in economics (2019 SCImago Journal Rank). Schwert (2021) reports that 88% of the papers published during our sample period were empirical. Moreover, even theoretical papers commonly use computer code to solve numerical problems or produce theoretical results.
13 http://jfe.rochester.edu/data.htm 14 Information about code and data were retrieved on October 7, 2020. We excluded ten papers with sharable material but published prior to 2010 and did not include ten forthcoming papers with available sharable material but scheduled to be published after the end of our sample period. As we did not check the personal website of all 2,231 authors or other data repositories for downloadable material associated with JFE papers, the reported 4.97% frequency should be seen as a lower bound.
We estimate the following logistic regression model: We define the different variables in Table 1. Panel A gives summary statistics for the four explanatory variables, and Panel B reports the results of different regression specifications.   and reputational costs may not be first order, as we would expect them to be larger for "better" researchers. 16 Third, in specifications (1)-(4), we also find that authors affiliated with universities located outside of North-America are significantly more likely to share code or data. This finding is consistent with the signaling benefit: new entrants (i.e., international researchers) send an additional costly signal to a predominantly North-American research community. 17 According to Harvey (2014), finance journals were reluctant to adopt mandatory data sharing policies by fear of losing submissions by the most senior and/or productive authors. Our evidence does not support this view. 18 In columns (5) and (6), we check directly whether the best-cited authors are more reluctant to share their data or code, and again find no effect. Finance journals may thus have less to lose from mandating a higher level of reproducibility than previously thought.

The Unverified Reproducibility Equilibrium
Having shed some light on the demand for reproducibility by authors and the supply of reproducibility by authors, we turn to the role of journals in matching the two. We summarize the existing evidence about the level of reproducibility in economic journals. We show that, until the recent 15 The results on seniority and citations are in line with the survey evidence reported in Swanson et al. (2020). 16 Instead of the mean across coauthors, we also used the median, minimum, and maximum values. We also contrasted researchers with tenure (i.e., Seniority > 6 years) and without tenure. The Citation variable was used alternatively with and without log transformation. We estimated the regression for North-American researchers only. In all cases, results remained qualitatively unchanged.
17 Schwert (2021) indicates that over the past decade the percentage of US authors (respectively referees) at JFE was greater than 65% (85%). He also shows in the context of a logit model that US authors have, all else equal, a higher acceptance rate than their international peers.
18 A limitation of our analysis is that we only observe published papers. It could be for instance that a paper is accepted because the data and code are available and despite its characteristics being associated with fewer citations. While we think this effect is unlikely to be first-order, we cannot rule it out based on our data. introduction of systematic pre-publication reproduction of the results by some journals (Vilhuber, 2019), the level of reproducibility has generally been low in economics. We then show in the context of our conceptual framework that competition between journals can be expected to lead to a suboptimally low level of reproducibility.

Empirical Evidence
When they exist, data policies have been only partially enforced. The study of McCullough et al.
(2008) on several economics journals with compulsory data-sharing policies reveals that the fraction of the papers that (1) are concerned by the policy (i.e., papers not using confidential data) and (2) actually have a companion data file available in the journal archive goes from 12% for the Economic Journal to close to 100% for the Journal of Applied Econometrics. These rates also vary over time.
For instance, the rate for the Federal Reserve Bank of St. Louis Review drops from 100% when the policy was first introduced in 1993 to 50% ten years later. 19 Is the unverified-reproducibility policy sufficient to guarantee reproducibility, or do journals need to move to stage 3, i.e., verified reproducibility? This can be tested by checking which fraction of articles in stage-2 journals can actually be reproduced from the numerical resources available in the journal's archives. McCullough et al. (2006) (2018) draws much more negative conclusions from this replication experiment, noting that Glandon's study only investigates nine papers, of which only five have been fully reproduced. Chang and Li (2017) study a broader set of 67 articles published in top economics journals. They were able to reproduce the results for one-third of these papers from the code and data available on the journals' repositories. For another 10% of these articles, the results were reproduced with some help from the original authors.
All these results point toward an unverified reproducibility policy being insufficient, so that there is a case for economics journals to engage in reproducibility verification or stage 3. More generally, an important question, which we are going to address in the next section, is whether all journals should increase their level of reproducibility, and have an incentive to do so.

Is Low Reproducibility Socially Optimal?
The observation of a low level of reproducibility in economics research does not necessarily imply that it is inefficient. Indeed, competition between academic journals may lead to the level of reproducibility that equates readers' demand with authors' supply.
We show in this section that the two-sided nature of competition between academic journals does not guarantee that the equilibrium level of reproducibility corresponds to a social optimum.
To emphasize this point, we will show a theoretical example in which journals are competitive and choose a suboptimally low level of reproducibility. Hence, the low level of reproducibility observed in economics may be due to a market failure and call for corrective actions.
We enrich the setup of Section 1.1 with a model of competition between two academic journals, indexed by k ∈ {1, 2}. Each journal aims at maximizing its impact, represented by the number of readers n R k , subject to the break-even constraint (3). 22 The game then plays as follows: Step 1: Each journal k simultaneously chooses the authors' fee p A k , readers' fee p R k ≥ 0, and reproducibility level q k ≥ 0.
Step 2: The authors and readers observe (p A 1 , p R 1 , q 1 ) and (p A 2 , p R 2 , q 2 ). Each author chooses whether to submit to journal 1, journal 2, or to no journal (a paper cannot be submitted to two journals). Each reader chooses whether to subscribe to journal 1, journal 2, both journals, or no journal. All players make their decisions simultaneously.
We provide a formal analysis of this game and the proofs of our main results in the appendix to this paper. To keep the model tractable, and in line with the literature on two-sided markets, 23 we reduce the articles' characteristics to a single dimension x i → U([0, 1]). More specifically, we assume that the authors of an article i published in journal k obtain: This assumption corresponds to a linear Hotelling specification, with some authors having a preference for journal 1 and others for journal 2. A higher t implies that the two journals are more 22 In our setup maximizing the number of readers is equivalent to maximizing the welfare of the journal's readers. While other specifications are possible, we are adopting the one that seems the least likely to bias the outcome towards a low level of reproducibility.
23 Most notably Armstrong (2006). This section can be seen as an extension of Section 5 in his paper, with reproducibility as an additional "quality" variable. See also Armstrong (2015) for an application to academic journals. Also note that quality is a choice variable for the journal. If it were a characteristic of authors, journals would face the problem of excluding some types (as in Hagiu (2009)) or sorting them with prices (as in Damiano and Li (2008)). differentiated and have more market power over the authors closer to them.
Symmetrically, readers' characteristics are reduced to a single dimension y j ∈ R + , distributed such that y readers have y j ≤ y. Subscribing to journal k gives reader j the payoff: Finally, we use a simple quadratic specification for the reproducibility cost faced by journals: We make the following assumptions on the parameters: Assumption 1 is a stability condition, standard in the literature, that ensures that both journals are sufficiently differentiated to coexist in equilibrium. The right-hand side of Assumption 2 means that reproducibility is desirable at least if the journal's costs are null. The left-hand side means that reproducibility is not "too desirable", which helps reduce the number of cases to consider. Assump-24 The critical assumption here is that the submission decisions of authors are not driven by characteristics that readers also care about. Note that our empirical analysis in Section 1.3.2 does not reject this assumption. Assuming instead for instance that authors with potentially more cited papers are also more sensitive to the level of reproducibility should intuitively reinforce the market failure exhibited in this section. tion 3 means that the journal's costs are sufficiently high relative to the benefits of reproducibility for readers, which also helps ruling out some possible equilibrium configurations.
We look for a subgame perfect Nash equilibrium. Authors and readers choose a journal so as to maximize their utility, rationally anticipating the behavior of other players. The two journals choose their fees and reproducibility levels so as to maximize their readership, rationally anticipating the future behavior of authors and readers. In addition, journals have to break even. Finally, we restrict our attention to symmetric equilibria with full coverage: in such an equilibrium (p A 1 , p R 1 , q 1 ) = (p A 2 , p R 2 , q 2 ) and all authors submit to a journal.
A critical feature of this market is that a given article can only be published in one journal at most ("single-homing"). Hence, an article i is published in journal 1 if and only if u A , and symmetrically for journal 2. In contrast, readers can subscribe to different journals ("multi-homing"). Reader j subscribes to journal k if and only if u R (n A k , q k , y j ) − p R k ≥ 0. Since we focus on equilibria with full coverage, the number of articles submitted to each journal for given prices and reproducibility levels is determined by solving for the cutoff type Is the equilibrium level of reproducibility socially optimal? To answer this question, we con-25 In this equilibrium, readers can subscribe to each journal for free ("open access"), and the costs of reproducibility are fully borne by the authors. This result is due to the assumption that β < a/b and is not generic. McCullough (2009b) observes that open access journals are less advanced than others in promoting reproducibility. Our model highlights that this may be due to the greater necessity for these journals to attract authors. sider the program of a social planner who would choose p A k , p R k , q k in order to maximize the total number of readers across both journals (or, equivalently, who would maximize the total welfare of readers), under the constraints that both journals break even and all authors submit their paper (full coverage). We obtain the following solution: Proposition 2. For any t ≤t, the social planner implements (p * * A , p * * R , q * * ) in both journals, with p * * R = 0, p * * A = C J (q * * , 1), and We can now compare the level of reproducibility achieved under competition and with a social planner: Proposition 3. For any t ≤t, q * increases in t and q * * decreases in t. Moreover, q * ≤ q * * with an equality in t =t.
Hence, we obtain that the social planner always chooses a higher level of reproducibility than the one we obtain under competition, as illustrated by Figure 2. The intuition is the following.
Because the authors are "single-homing", they form what Armstrong (2006) calls a "competitive bottleneck": the journals are competing over attracting the marginal article, whereas for a given number of readers the demand of readers for a journal does not depend on the strategy of the other journal. Since reproducibility is costly to authors, the journals reduce their reproducibility level to attract authors. The social planner instead does not face the "competitive bottleneck" problem and does not have to leave any surplus to the author with x i = 1/2. 26 As t increases the social planner needs to choose a lower level of reproducibility in order to keep all authors submitting, hence q decreases. On the contrary, under competition as t increases the journals are more and 26 The result can also be compared to the case of competition between platforms owned by associations in Rochet and Tirole (2003), which does not lead to the first-best due to a business-stealing effect.

Removing Barriers to Verified Reproducibility
As of today (July 2021), only a handful of economic journals moved to verified reproducibility. The previous section showed that this may correspond to a suboptimal equilibrium of the competition between journals. This section discusses several avenues to move out of this equilibrium.

Changing Journals' Incentives
The effect of competition between journals is similar to situations in which competition between producers leads to a suboptimally low quality (e.g., Kranton (2003)). Classical solutions to this problem are the establishment and enforcement of industry standards, in this case a common reproducibility policy across journals, or an initiative to increase reproducibility taken by a journal with sufficient market power (when t is large in our model). As we will now discuss, the top 3 finance journals illustrate the first possibility, and the American Economic Review the second.
As explained in Harvey (2014), an initiative to increase reproducibility at the top 3 finance journals emerged in 2010. The idea was to adopt a common reproducibility policy at all journals simultaneously, akin to fixing a common level q in both journals in our model. Ultimately this initiative was not adopted. Among the different reasons mentioned in Harvey (2014), an important one is competition. In particular, the top 3 finance journals do not have such an important market power over authors, as they need to compete with top economics journals to attract the best finance papers.
In economics journals instead, the marginal benefit of publishing a paper in a top-5 journal is so large (Heckman and Moktan (2020)) that researchers have strong incentives to comply with any standard or disclosing requirement imposed by these journals. 27 There is of course competition among top-5 journals. Despite this competitive pressure, the market power of the American Economic Review seems to have been high enough to keep its ambitious 2004 data policy, and even taking the next step of verified reproducibility in 2018 for conditionally accepted papers (as shown in Figure 2). Moreover, the 2004 data policy became a standard that was adopted by the other journals. Indeed, all top-5 journals now have a similar data policy, and two of them (Journal of Political Economy and Quarterly Journal of Economics) explicitly mention that they adopted the AER's 2004 policy.

Lowering Costs for Journals
Verified reproducibility is easier to achieve if it is conducted by people or organizations with the right expertise and incentives. In theory, three different models could be envisioned: A first possibility could be to add verification to the tasks of the regular editors and referees of the journal.
However, editors and referees may not have the time, expertise, and data access to check the repro-27 See Ductor et al. (2020) for a theory of why generalist journals became dominant in economics. Note that in their paper, there are externalities only among authors publishing papers to the same journal, whereas in our model there are also externalities between readers and authors. ducibility of all accepted papers. 28 Importantly, they also do not have the right incentives. In the suboptimal equilibrium discussed in Section 2.2, even if both journals announce a minimum level of reproducibility q, each journal has an incentive to renege on this level ex post. Concretely, one can imagine the situation of an editor and referees who have accepted a promising paper for publication after multiple rounds of revision. If at this stage, the editorial team discovers that the data policy is not strictly adhered to by the authors, there seems to be a large cost and little benefit to stop the publication process. On the contrary, the editorial team may rightly consider that the benefit for the journal of publishing an impactful paper will be larger than the cost of not fully enforcing the data policy.
A second possibility is for the journal to appoint a special editor in charge of implementing the verification policy, thus avoiding the conflict of objectives that arises when the same editor is in charge of selecting impactful papers and verifying reproducibility. 29 For instance, in 2017 the AEA appointed Lars Vilhuber from Cornell University as the data editor for all the journals operated by the Association. On its website, the AEA defines the role of the data editor as follows: "[The data editor] will work with the AEA journal editors and Executive Committee to develop and implement methods to maximize replicability and reproducibility of research findings published in AEA journals.
Such methods may involve some pre-publication verification of materials provided by authors but will also encourage incorporating basic principles of replicability into researchers' workflows and address the increasing reliance on restricted-access data". Since then, similar positions have been created at the Review of Economics Studies, the Economic Journal, and Management Science.
A third possibility for a journal is to use the services of a trusted third party dedicated to verifying research reproducibility. The latter can either complement a journal internal replication 28 Leek and Peng (2015) observe that editors and reviewers at medical and scientific journals often lack the training and time to rerun a data analysis. This problem is compounded by the fact that datasets and data analyses are becoming increasingly complex and the number of submissions to journals continues to increase. 29 To the best of our knowledge, the first journal to follow such a policy was Biostatistics (Peng (2009); Peng (2011)). For more examples of journal policies to verify computational research, see Willis and Stodden (2020). team for some types of verification or replace it. In any case, internal and external validators conduct similar tasks: they check the submitted material, rerun the code, contrast the results obtained with those in the paper, write a reproducibility report presenting all the steps and any discrepancies. An example is the cascad certification agency, which conducted 21 verifications for journals managed by the American Economic Association (Vilhuber, 2021). Another interesting example is the partnership between the American Journal of Political Science and the University of North Carolina's Odum Institute. This institute provides the journal with a specialized data review service to guarantee the quality of replication datasets (Christian et al., 2018). 30

Allowing Verification of Confidential Data
The use of confidential data is often mentioned as a major impediment to the implementation of reproducible research (Christensen and Miguel, 2018). Without a solution to handle papers using confidential data, only two outcomes are possible: (1) Exclusion: the journal publishes only papers based on non-confidential data, and may have to pass on many interesting papers; (2) Exemption: papers using confidential data are exempted from the DAP, which may leads to many articles being non-reproducible, and may even encourage authors to work on confidential data so as to bypass the policy.
As an example of exclusion, the DAP of the review PLOS states that whenever the data cannot be accessed by other researchers, the manuscript must include an additional analysis based on public data that validates the conclusions so that others can reproduce the results. 31 Exemption is the most common policy among economics journals. Among the 49 DAP considered by Vlaeminck and Herrmann (2015), 34 offer exemptions to the policy for confidential datasets. Christensen and Miguel (2018) show that the share of empirical papers published in the American Economic Review 30 See Willis and Stodden (2020) for more examples. 31 See the unacceptable data access restrictions considered in PLOS data policy, https://journals.plos.org/ plosone/s/data-availability.
that fall under these exemptions rose sharply from 10% to around 40% between 2005 and 2015.
These numbers show that exclusion is not a realistic possibility in economics. Confidential data on consumers or firms allow researchers to address new research questions or provide innovative answers to traditional ones. It provides to those who can access such data a comparative advantage and greatly increases chances to publish in top journals. Moreover, the extension of the legal frameworks protecting confidentiality implies that a growing fraction of the data used in economics now has to be treated as confidential. 32 If the exclusion of papers using confidential data imposes too high a cost on the progress of economics, making 40% of published articles non-reproducible does not seem satisfactory either.
Hence there is a need to find a reproducibility solution for papers using confidential data. In principle, editors or referees of a journal could access such data for purpose of reproducibility by following a specific accreditation process for each confidential-data provider. The reason why journals typically do not use this option is that each journal would have to follow an often long and tedious accreditation process for each confidential data provider.
Here the use of a third party verification team can prove particularly efficient: once accredited by a confidential-data provider, the third party can use this accreditation for all papers using the data, regardless of the journal they are published in. As an example, in 2018 cascad partnered with the Centre d'Accès Sécurisé aux Données (CASD), a public research infrastructure enabling researchers to access individual data from the French Institute of Statistics and Economic Studies (INSEE), and from various French public administrations and ministries (Pérignon et al., 2019). In total, CASD hosts data from 378 sources and offers a data provider service to 742 user institutions. In October 2018, the French Statistical Secrecy Committee granted cascad a permanent accreditation 32 A recent example is the EU General Data Protection Regulation enforced in 2018. Unlike the European Union, the US does not have a single law on data protection but instead a system of federal and state laws and regulations, including among others the Federal Trade Commission Act, the Financial Services Modernization Act, the Health Insurance Portability and Accountability Act, the Electronic Communications Privacy Act, etc. for all their verification reviewers.

Achieving Economies of Scale
An obvious way of favoring the move towards verified reproducibility is to decrease its cost, represented by C J (q, n A ) in our theoretical framework. In this section, we summarize some quantitative information regarding this cost and discuss the implications for how to best organize the verification of reproducibility.
We consider a given level of reproducibilityq, corresponding to Stage 3 on the reproducibility scale displayed in Figure 1. We take as given the number n A of articles to verify, and the number n D of distinct non-public databases to access. The total cost can be represented as: We briefly discuss and give a tentative estimate of each cost. Our estimates are based on the actual experience of the cascad verification agency. However, these estimates are necessarily quite rough and are only provided to give an order of magnitude of the costs of verifying reproducibility.
-Fixed costs (c F ) reflect the cost of setting up a Swiss-army-knife IT infrastructure allowing the replicating team to run any code provided by an author. Based on the actual expenses faced by cascad, we set c F to 50,000 euros. This includes expenses related to dedicated hardware, storage capacity, cloud resources, software, etc. In addition, the costs include building an online platform allowing data editors to manage manuscripts and to communicate with reviewers and researchers, and covering legal and administrative costs, as well as a fraction of the salary of the two reproducibility editors.
-Labor costs (c L ) reflect the compensation of the technical staff in charge of checking the compliance of the submitted material to the guidelines, running the code, comparing the results with the ones in the paper, and writing an execution/reproducibility report to be provided to the data editor. By looking at the actual time spent by the reproducibility reviewers in 2020, we set the average number of hours per verification to 10 hours. 33 Given the salaries actually paid by cascad in 2020, we use an hourly rate of 15 euros. Hence c L is approximately equal to 150 EUR on average.
-Computing costs (c C ) need to be paid when the code is run on a commercial cloud. Our estimate of C C is 5 EUR per article.
-The cost of accessing data (c D ) varies a lot across databases. Many commercial databases are already available "for free" via a campus license. Finding the fee for other commercial databases is easy, but providing the fair monetary cost of establishing a partnership with a restricted-data access center like CASD is almost impossible. Yet, not including it would lead to massively underestimate the cost of setting up a verification service. Averaging across commercial and administrative databases, we use c D = 5, 000 EUR as a rough (and conservative) estimate. 34 We now have estimates for all the parameters of the functionC(n A , n D ). Given the large share of data costs in the total reproducibility costs, a critical issue is how many new databases become necessary as the number of papers to be verified grows. As in Maurice Allais' famous "Calais traveller metaphor", the marginal cost of a new paper using a dataset that is not currently available to reviewers is much higher than the cost of a paper using already available datasets. Furthermore, bringing a new data source enriches the data portfolio of the reviewing team, which makes less likely the need to access an additional data source for the next papers to be verified (as the pool of available data sources is now larger). 33 Our estimate is higher than the 5 hours reported by Vilhuber (2019) at Cornell University. We believe the reason the reviewing time is on the high side at cascad is that (1) the proportion of papers using confidential data is larger at cascad and (2) the level of compliance of the submitted material with the guideline remains moderate. 34 In the case of CASD, access to the data requires formal approval from the French Statistical Secrecy Committee, which is a 3-6 month process. We found 134 articles published over 2016-2020 and acknowledging using CASD data, in 91 different academic journals. In order to verify the reproducibility of all these articles without cascad, a total of 91 different journals would have had to go through the same lengthy accreditation process. In some cases, access to the data by academic journals would have been simply impossible, as access is restricted to users based in France (e.g., data from the French Tax authority).
To obtain a rough estimate of how the number of necessary datasets grows with the number of papers, we use the following model. Assume there is a totaln D of non-public datasets that can be used in economic research. A fraction 1 − θ of papers use public data or no data, and a fraction θ pick one dataset at random among then D of non-public datasets. We show in the Appendix that the expected number of distinct non-public datasets that will be used by n A papers is equal to: In our quantification exercise below, we use θ = 0.4 (estimate of Christensen and Miguel (2018) for the American Economic Review ) and tentatively setn D = 50. We finally arrive at the following cost function: Finally, we take into account that there might be multiple verification teams. If there are n C such teams and they share the n A articles to be verified equally, we can compute the average cost per article AC(n A , n C ) as: EUR for n C = 2, hence economies of scale are still very significant. Assume we go further and the verification team also verifies the papers from the four other top-5 journals. Together, these four journals published 261 papers in 2019. Multiplying by three, we reach a total of n A = 1, 815 for three years. At this level, the average cost per paper falls to 320 EUR for n C = 1, and 485 EUR for n C = 2. Obviously the costs fall further if one adds other journals or the fixed costs can be amortized over more years, and the average cost reaches 155 EUR per article in the limit.
Assuming there is only one verification team, how large is the cost of 320 EUR 358 USD (2019) per article? One way to answer this question is to estimate how much different sources of income for the AEA would have to be increased to compensate the costs of verifying the reproducibility of all articles published by AEA journals. 35 One possibility is to ask the authors of the papers reproduced to pay the cost. This would increase the submission fees for authors of accepted papers from 200 USD to 558 USD, a 179% increase. A second possibility would be to increase the submission fees for all papers submitted. Given a 7% acceptance rate for AEA journals on average, one would need to raise submission fees by 0.07 × 358 = 25.06 USD, a 12.5% increase from the current submission fees. Finally, a third possibility would be to raise the costs on readers. According to the financial statements of the AEA, the AEA earned 5.931 million USD in licensing fees and subscription fees for its journals in 2019. In order to absorb an extra cost of 344 × 358 = 123, 152 USD, these fees would have to increase by 2.07%.
The lesson we draw from this quantitative exercise is that the costs of verified reproducibility are far from negligible, but still manageable if one strives to reduce implementation costs and finds the right economic model. Very practically, our estimates mean that the total cost of verifying the reproducibility of all articles published in the top-5 economics journals and other AEA journals would be less than USD 220,000 per year, close to the average annual salary of one full professor in economics (Scott and Siegfried, 2021). However, this relatively low number assumes operating at scale. This is certainly a factor in the success of the policy adopted by the AEA journals.
Achieving the same outcome will be more difficult for standalone journals, unless they are able to pool resources or resort to third-party verification.

Externalities and Government Intervention
In our framework, the journals internalize the demand for reproducibility coming from "readers", which could be the subscribers of the journals (who pay subscription fees) and/or academics who cite the articles published by the journal in their research. However, an article i published in a journal may have a social utility, denoted u S (q, X i ), in addition to the readers' utility. Even a monopoly journal maximizing the utility of readers will fail to take this additional social utility into account. If u S is increasing in q, then this is another channel through which the equilibrium level of reproducibility might be suboptimally low.
A traditional externality of research is its application to create new products or processes and generate economic growth. Reproducibility of research may strengthen such spillovers: the availability of the code and data used by academics may greatly facilitate the transformation of scientific discoveries into economic forces. This echoes the "stimulation" function of reproducibility of Ragnar Frisch described in Section 1.2.
In recent years, the demand for "evidence-based" policymaking has also greatly increased (Foun- How well-intended these policies might be, centralized authorities are likely to be in a worse position than academics themselves to evaluate the costs of reproducibility on researchers and strike the right trade-off. 37 Scientific associations and journals may be seen as "self-regulatory organizations" that set the rules of research independently of the government, but with government 36 In 2014, the NSF proposed a framework to improve the reproducibility and replicability in funded research, including data sharing policy and data management plans (National Science Foundation, 2014). In the European Union's Horizon 2020 funding program, research data underlying a publication has to be made available, in addition to the requirement to create a data management plan (European Commission, 2020).
37 See for instance the critical discussion in Armstrong (2021)  intervention remaining a last resort possibility. This implies that academics have a collective interest in setting an appropriate level of reproducibility in research, in order to make additional government intervention unnecessary.

Refereeing
Our finding that competition between journals may lead to a suboptimally low level of reproducibility stands in sharp contrast with the view expressed by Ellison (2002a) and Ellison (2002b) that there exists a "race to the top" in the requirements imposed by referees on authors, leading to longer delays in publication, more robustness checks, longer appendices, etc. However, there is no contradiction between these two mechanisms. The "race to the top" in Ellison (2002a) stems from the behavioral biases of referees, who mistakenly infer from their own submission history that journals require a very high quality of execution. The requirements on reproducibility are for the moment not in the hands of referees but of editors, and it is hard to see how the mechanism on which Ellison (2002a) relies could be transposed at this level.
Conversely, our "competitive bottleneck" mechanism could imply that editors may care less about quality of execution (which may be another interpretation of q in the model) and more about the idea of a paper. This view is summarized by Matthew Spiegel, reflecting on his editorship of the Review of Financial Studies, as "reviewing less and progressing more" (Spiegel (2012)).

Conclusion
Writing empirical papers in economics takes a great deal of time -literally years of data cleaning, merging, coding, debugging, analyzing and re-analyzing -yet, for decades, data and code played no role in the peer review process. When receiving a new manuscript to review, editors and referees had to assume that the results outlined in the paper were actually resulting from running the researchers' computer code on their data. Over the past 15 years, economics journals have introduced and then gradually enriched their reproducibility policy: first inviting and then forcing authors to share their code and data. There is now growing empirical evidence to show that this period, which we call the unverified reproducibility policy stage, led to low compliance rate with DAP, low quality of the shared resources, and in turn low reproducibility rates.
In this paper, we show that this unfortunate situation can be the suboptimal outcome of twosided competition between economic journals. If so, it is possible to improve the situation by increasing the level of reproducibility. Finally, we show that one way to mitigate this market failure is to conduct a systematic verification of the results prior to publication (verified reproducibility stage).
Such pre-publication verification could be conducted either internally by journals or outsourced to a trusted third party. Interestingly, this is the strategy followed by the American Economic Association, as well as by a handful of other leading actors of the scientific publishing industry.
While the focus of this paper is on economics research, the situation described and some of the solution put forward are by no means specific to our field. In particular, our analysis resonates well with current discussions in the medical studies in the context of the Covid crisis. Indeed, both economics and medical studies are extremely data and computationally intensive, focus on causality, make extensive use of confidential data, and play a vital role in informing the policy and societal debates. The publication in May 2020 in The Lancet of an article on the effects of Hydroxychloroquine in the treatment of Covid-19, followed by its quick retraction, illustrates both the high societal demand for reproducibility and the difficulties journals face in supplying it. The recent policy changes and initiatives discussed in this paper could be sources of inspiration for the medical community.

A.1 Proof of Proposition 1
We first solve for the equilibrium numbers of authors and readers on each journal for given fees and quality levels. Given our assumptions, these numbers have to satisfy the following system: Solving for this system yields the following equilibrium allocation of authors and readers, for given prices and reproducibility levels: Note that Assumption 1 ensures that n A k and n R k are both decreasing in p A k and p R k , while Assumption 2 ensures that all else equal n A k and n R k are both increasing in q k . We can now solve for the equilibrium in Step 1. Taking (p A 2 , p R 2 , q 2 ) as given, we write the following Lagrangian for journal 1: We then differentiate with respect to p A 1 , p R 1 , and q 1 : In a symmetric equilibrium, these derivatives have to be zero at the equilibrium prices and reproducibility levels p A 1 = p A 2 = p A , p R 1 = p R 2 = p R , and q 1 = q 2 = q. We obtain: In addition, we have the constraints λ ≥ 0, µ ≥ 0, ν ≥ 0, n A (p A − C J (q)) + n R p R ≥ 0, p R ≥ 0, q R ≥ 0, λ[n A (p A − C J (q)) + n R p R ] = 0, µp R = 0 and νq = 0. Finally, we need to check that the solution satisfies the assumption of full coverage, meaning that an article with x i = 1 2 gives positive surplus to its authors, which gives: We are going to show that under our assumptions any solution to this problem has λ > 0, µ > 0, and ν = 0. We then solve analytically for this equilibrium, derive the expression oft and show that our equilibrium holds if and only if t ≤t.
Note first that immediately follows from (A.14) that the budget constraint is binding and λ > 0.

(A.19)
This quantity is necessarily negative, which is a contradiction. Hence, we cannot have q = 0.
Step 2µ > 0: Assume a solution with µ = 0 and ν = 0. We then solve for the system of equations formed by (A.14), (A.15), (A.16), and the binding budget constraint, to be solved in p A , p R , q, λ. In particular, we obtain: 38 We need both q and p R to be positive. Given Assumption 3 this is equivalent to having: However, it is easily shown that the left-hand side term is lower than the right-hand side term if and only if βb ≥ a, which violates Assumption 2. Hence, under our parametric assumptions we cannot have a solution with µ = 0.
Step 3 -Candidate solution: we now consider the only remaining candidate solution, which is to have λ > 0, µ > 0, and ν = 0. We set p R = 0 and ν = 0. The budget constraint then gives us p A = C J (q). Replacing p A with C J (q), we then solve for the system of equations formed by (A.14), (A.15), (A.16), to be solved in q, λ, µ.
(A.14) immediately gives: We then plug this value of λ into (A.16) and obtain: 38 There is another solution to the system, in which q has the same expression with a negative coefficient in front of √ X. q is then obviously negative, so that this solution can be discarded. Finally, we replace λ and q in (A.15) to obtain: µ = 2t(κ − 2b 2 ) + 2βb(a + αβ) − κβ(α + β) 2κ(t − αβ) . (A.26) We need µ > 0, which is equivalent to t > t, with: The term in brackets is positive due to Assumptions 2 and 3. Hence, t < αβ. Assumption 1 thus guarantees that t > t.
Step 4 -Full coverage: The last point to check is the assumption of full coverage. We need to have t ≤t, wheret is such that the author of an article with x i = 1/2 makes zero surplus by submitting to a journal. This gives: The unique positive root to this equation gives us: t = αβ + β (4b(a − bα)) 2 + 32b 2 κB + β 2 κ 2 − βκ 8b 2 > αβ. (A.30) In particular, we havet > αβ so that the range of values t such that a symmetric equilibrium with full coverage exists is always non empty.

A.2 Proof of Proposition 2
The social planner chooses p A , p R , q symmetrically for both journals under the constraint that an article with x i = 1/2 gets submitted. Since the two journals are symmetric, we can write the planner's Lagrangian as: L = n R + λ[n A (p A − C J (q)) + n R p R ] + µp R + νq (A.31) with n A = 1 2 (A.33) We then differentiate with respect to p A , p R , and q to get: Assumption 2 and condition (A.37) immediately give us q > 0 and hence ν = 0. Using (A.37) again gives us λ > 0 and hence ρ > 0 using (A.35).

(A.38)
Under this form, we immediately see that Assumptions 2 and 3 imply that p R < 0. A contradiction.
The only possible solution to the planner's program thus involves µ > 0 and p R = 0. We then need to solve (A.35), (A.36), (A.37), the budget constraint, and the constraint that all articles are submitted, in p A , q, λ, µ, ρ. We obtain: shows that q * = q * * .

A.4 Proof of Equation (11)
For a given n A , denote X the number of different databases that the n A papers will use. Denote L i the event "database i is used by at least one paper". We have E[X] =n D E[L 1 ] and: E[L 1 ] = 1 − Pr("no paper uses database 1") = 1 − n D − 1 n D θn A We then obtain equation (11).