1 Introduction

This file includes the code for a chapter to be submitted to the Routledge Handbook of Philosophy of Economics (edited by Conrad Heilmann and Julian Reiss).

1.1 Set up

Loading packages:

require(data.table)
require(ggplot2)
require(ggrepel)
require(tidyr)
require(igraph)
require(RMySQL)
require(bibliometrix)
require(dplyr)
require(stringr)
require(tm)
require(RColorBrewer)
require(janeaustenr)
require(tidytext)
require(knitr)
require(zoo)
require(viridis)
require(tools)
require(xtable)
require(DT)

Loading project-specific functions:

source("FCT_util.R")

We load and transform some smaller objects that will be central to the workflow. The bigger objects will be loaded in individual code chunks (and then removed at the end of the chunk) to avoid caching too much data.

# Loading discipline info:
discipline_info <- readRDS("/projects/digital_history/interdisciplinarity/data/discipline_info.rds")

# We want philosophy and science studies together
discipline_info[Code_Discipline %in% c(126, 139), discipline:= "Philosophy and Science Studies" ]



# When we start
first_y <- 1990

# When we stop for JEL metho corpus (see below for the explanation)
last_y_metho = 2018


#To be sure that the tf-idf graph and the topic_thru_time graph have the same colors, we define them here.
colors_cluster <- viridis(9)
names(colors_cluster) <- c("Moral\nPhilosophy", "Big M", "Political\nEconomy",
                           "Decision\nTheory", "History of\nEconomics", "Small m","Critical\nRealism",
                           "Institutional\nEconomics", "Behavioral\nEconomics")

#Our cleaning process produces some spelling errors, so we correct them with this conversion table.
conversion_table <- c("terence hutchison" = "Terence Hutchison",
                     "data mining" = "data-mining",
                     "mccloskey" = "McCloskey",
                     "datamining" = "data-mining",
                     "adam smith" = "Adam Smith",
                     "amartya sens" = "Amartya Sen",
                     "tony lawson" = "Tony Lawson",
                     "post keynesian" = "Post Keynesian",
                     "blaug" = "Blaug",
                     "john" = "John",
                     "mill" = "Mill",
                     "nobel" = "Nobel",
                     "alfred marshall" = "Alfred Marshall",
                     "lionel robbins" = "Lionel Robbins",
                     "veblen" = "Veblen",
                     "friedman" = "Friedman",
                     "friedman methodology" = "Friedman methodology",
                     "lakatos" = "Lakatos",
                     "boland" = "Boland",
                     "cambridge" = "Cambridge",
                     "cambridge controversy" = "Cambridge controversy",
                     "evidencebased" = "evidence-based",
                     "sens" = "Sen's",
                     "symposium amartya" = "symposium Amartya",
                     "sens philosophy" = "Sen's philosophy",
                     "mises" = "Mises",
                     "weintraubs" = "Weintraub's",
                     "cook" = "Cook",
                     "coase theorem" = "Coase theorem",
                     "soros" = "Soros",
                     "keyness" = "Keynes's",
                     "george" = "George",
                     "coats" = "Coats",
                     "hausman" = "Hausman",
                     "american" = "American",
                     "austrian" = "Austrian",
                     "kuhns paradigm" = "Kuhn's paradigm",
                     "kuhnian perspective" = "Kuhnian perspective",
                     "zillak" = "Zillak",
                     "darwinism" = "Darwinism",
                     "hayeks" = "Hayek's",
                     "coases" = "Coase's",
                     "postkeynesian" = "Post-Keynesian",
                     "keynesian economics" = "Keynesian economics",
                     "friedrich hayek" = "Friedrich Hayek",
                     "marxist" = "Marxist",
                     "american school" = "American school",
                     "industrialrelations" = "industrial relations",
                     "economicthought action" = "economic-thought action",
                     "lucas" = "Lucas",
                     "shiller" = "Shiller",
                     "stuart mill" =  "Stuart Mill",
                     "ricardos method" = "Ricardo's method",
                     "malthus" = "Malthus",
                     "john stuart" = "John Stuart",
                     "italian" = "Italian",
                     "schumpeter" = "Schumpeter", 
                     "pareto" = "Pareto",
                     "german" = "German")

#For the tf-idf, we decided to hard code the order in which the topic are supposed to appear in the graph. We want it to be in the same order as the topic through time graph for readability. Since the order of discipline in the topic through time graph is calculated after the tf-idf graph is made, we decided to hard code it here instead of changing the order in which graph appears in the markdown (because the present order makes more sense).
order_disc_philo <- c("Moral\nPhilosophy", "Behavioral\nEconomics", "Big M", "Small m", "Decision\nTheory")

order_disc_metho <- c("Institutional\nEconomics","Critical\nRealism", "Political\nEconomy", "Big M", "Small m", "History of\nEconomics")


JEL_doc_topic_map  <- data.table(document = 1:6, Topic = c("Big M","Political\nEconomy","History of\nEconomics","Critical\nRealism", "Institutional\nEconomics", "Small m"))

2 Constructing the corpora

We have two corpora coming from distinct bibliometric sources. Although we cannot give the data away (because of license restrictions), we give in this section much of the pretreatment code use. If someone fetches the data from the bibliometric database, it should be possible to replicate our results.

2.1 Specialized philosophy of economics

Our corpus capturing the field of specialized philosophy of economics is composed of two journals:

Economics & Philosophy (E&P, since 1985)
Journal of Economic Methodology (JEM, since 1994, which was preceded by Methodus, but the few issues of this journal are not in Scopus.)

Although our team typically gets its bibliometric data from the Web of Science's (WoS) version of the Observatoire des sciences et technologies, WoS contains data for JEM only since 2013. To have a complete corpus, we thus turned to Scopus. Data were retrieved in early 2020, so we have complete records for the journals up to and including 2019.

The R package Bibliometrix makes it easy to load the data and generate initial descriptive results.

2.1.1 Economics & Philosophy

dt_one_j <- convert2df(file = 
                       "/projects/digital_history/philo_and_economics/data/BibTeX files/Corpus A - Core Phil Eco Journals and Books/Eco & Phil Vols 1-35 (1985-2019).bib",
                     dbsource= "scopus", format = "bibtex")

## 
## Converting your scopus collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

# print(paste("There are",nrow(dt_one_j), "documents from this journal."))

print("Automated analysis from the bibliometrix package.")

## [1] "Automated analysis from the bibliometrix package."

results_one_j <- biblioAnalysis(dt_one_j, sep = ";")


# summary of the analysis (error with the function)
summary(object = results_one_j, k = 10, pause = FALSE)

## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              1985 : 2019 
##  Sources (Journals, Books, etc)        1 
##  Documents                             614 
##  Average years from publication        16.2 
##  Average citations per documents       14.38 
##  Average citations per year per doc    0.8024 
##  References                            15728 
##  
## DOCUMENT TYPES                     
##  article               422 
##  article in press      9 
##  conference paper      31 
##  editorial             10 
##  erratum               4 
##  letter                2 
##  note                  10 
##  review                126 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    45 
##  Author's Keywords (DE)                388 
##  
## AUTHORS
##  Authors                               512 
##  Author Appearances                    758 
##  Authors of single-authored documents  346 
##  Authors of multi-authored documents   166 
##  
## AUTHORS COLLABORATION
##  Single-authored documents             497 
##  Documents per Author                  1.2 
##  Authors per Document                  0.834 
##  Co-Authors per Documents              1.23 
##  Collaboration Index                   1.42 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1985       19
##     1986       11
##     1987       19
##     1988       16
##     1989       13
##     1990       13
##     1991       16
##     1992       13
##     1993       17
##     1994       19
##     1995       16
##     1996        9
##     1997       16
##     1998        8
##     1999       14
##     2000       15
##     2001       13
##     2002       11
##     2003       19
##     2004       20
##     2005       24
##     2006       18
##     2007       22
##     2008       26
##     2009       18
##     2010       12
##     2011       13
##     2012       16
##     2013       20
##     2014       23
##     2015       20
##     2016       18
##     2017       15
##     2018       33
##     2019       39
## 
## Annual Percentage Growth Rate 2.137593 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1     HAUSMAN DM        11  NA NA                            11.0
## 2     NA NA             11  HAUSMAN DM                       10.0
## 3     SUGDEN R          11  BROOME J                          8.5
## 4     BROOME J           9  SUGDEN R                          8.5
## 5     FLEURBAEY M        7  QIZILBASH M                       6.0
## 6     LIST C             7  SEN A                             6.0
## 7     MONGIN P           6  FLEURBAEY M                       5.5
## 8     QIZILBASH M        6  MONGIN P                          5.5
## 9     SEN A              6  GUSTAFSSON JE                     5.0
## 10    VOORHOEVE A        6  CARTER I                          4.5
## 
## 
## Top manuscripts per citations
## 
##                     Paper           TC TCperYear
## 1  BINMORE K, 1987, ECON PHILOS    292      8.59
## 2  MORRIS S, 1995, ECON PHILOS     174      6.69
## 3  SUGDEN R, 2000, ECON PHILOS     157      7.48
## 4  BUCHANAN JM, 1991, ECON PHILOS  153      5.10
## 5  HIRSCHMAN AO, 1985, ECON PHILOS 139      3.86
## 6  ETZIONI A, 1986, ECON PHILOS    138      3.94
## 7  FLEURBAEY M, 1995, ECON PHILOS  137      5.27
## 8  NUSSBAUM MC, 2001, ECON PHILOS  136      6.80
## 9  DONALDSON T, 1995, ECON PHILOS  117      4.50
## 10 STALNAKER R, 1996, ECON PHILOS  111      4.44
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq SCP MCP MCP_Ratio
## 1  USA                  74 0.4966  71   3    0.0405
## 2  UNITED KINGDOM       23 0.1544  21   2    0.0870
## 3  FRANCE                8 0.0537   8   0    0.0000
## 4  GERMANY               7 0.0470   7   0    0.0000
## 5  CANADA                5 0.0336   4   1    0.2000
## 6  SWEDEN                5 0.0336   5   0    0.0000
## 7  NETHERLANDS           4 0.0268   4   0    0.0000
## 8  BELGIUM               3 0.0201   2   1    0.3333
## 9  NORWAY                3 0.0201   3   0    0.0000
## 10 AUSTRALIA             2 0.0134   1   1    0.5000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                       1218                     16.46
## 2  UNITED KINGDOM             212                      9.22
## 3  SWEDEN                     191                     38.20
## 4  NORWAY                      80                     26.67
## 5  ISRAEL                      50                     25.00
## 6  FRANCE                      45                      5.62
## 7  SWITZERLAND                 43                     21.50
## 8  GERMANY                     29                      4.14
## 9  BELGIUM                     24                      8.00
## 10 AUSTRALIA                   17                      8.50
## 
## 
## Most Relevant Sources
## 
##             Sources        Articles
## 1 ECONOMICS AND PHILOSOPHY      614
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1      DECISION THEORY              7  ABORTION                     6
## 2      PRIORITARIANISM              6  BEHAVIOR                     6
## 3      FAIRNESS                     5  FAMILY PLANNING              6
## 4      NUDGE                        5  ECONOMICS                    5
## 5      EGALITARIANISM               4  CRITIQUE                     4
## 6      EXPLOITATION                 4  ECONOMIC FACTORS             4
## 7      BEHAVIOURAL ECONOMICS        3  FERTILITY CONTROL            4
## 8      CLIMATE CHANGE               3  INDUCED                      4
## 9      DELIBERATION                 3  POSTCONCEPTION               4
## 10     DISTRIBUTIVE JUSTICE         3  PSYCHOLOGICAL FACTORS        4

# Multiple graphs:
plot(x = results_one_j, k = 10, pause = FALSE)

# print("Getting the most-cited references")
CR <- citations(dt_one_j, field = "article", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited references")

Most-cited references
RAWLS, J., (1971) A THEORY OF JUSTICE, , HARVARD UNIVERSITY PRESS	24
RAWLS, J., (1971) A THEORY OF JUSTICE, , CAMBRIDGE, MA: HARVARD UNIVERSITY PRESS	17
BROOME, J., (2004) WEIGHING LIVES, , OXFORD: OXFORD UNIVERSITY PRESS	15
COHEN, G.A., ON THE CURRENCY OF EGALITARIAN JUSTICE (1989) ETHICS, 99, PP. 906-944	15
ANDERSON, E., WHAT IS THE POINT OF EQUALITY? (1999) ETHICS, 109, PP. 287-337	11
BROOME, J., (1991) WEIGHING GOODS, , OXFORD: BLACKWELL	11
KAHNEMAN, D., TVERSKY, A., PROSPECT THEORY: AN ANALYSIS OF DECISION UNDER RISK (1979) ECONOMETRICA, 47, PP. 263-291	10
PARFIT, D., (1984) REASONS AND PERSONS, , OXFORD UNIVERSITY PRESS	10
DWORKIN, R., WHAT IS EQUALITY? PART 2: EQUALITY OF RESOURCES (1981) PHILOSOPHY AND PUBLIC AFFAIRS, 10, PP. 283-345	9
HARSANYI, J.C., CARDINAL WELFARE, INDIVIDUALISTIC ETHICS, AND INTERPERSONAL COMPARISONS OF UTILITY (1955) JOURNAL OF POLITICAL ECONOMY, 63, PP. 309-321	9

# print("Getting the most-cited first author")
CR <- citations(dt_one_j, field = "author", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited first authors")

Most-cited first authors
SEN A	404
SUGDEN R	192
BROOME J	178
RAWLS J	158
SEN A K	156
KAHNEMAN D	146
TVERSKY A	119
FLEURBAEY M	102
HAYEK F A	101
COHEN G A	99

rm(dt_one_j,results_one_j)

2.1.2 Journal of Economic Methodology

dt_one_j <- convert2df(file = 
                       "/projects/digital_history/philo_and_economics/data/BibTeX files/Corpus A - Core Phil Eco Journals and Books/JEM Vols 1-26 (1994-2019).bib",
                     dbsource= "scopus", format = "bibtex")

## 
## Converting your scopus collection into a bibliographic dataframe
## 
## 
## Warning:
## In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!
## 
## Please, take a look at the vignettes:
## - 'Data Importing and Converting' (https://cran.r-project.org/web/packages/bibliometrix/vignettes/Data-Importing-and-Converting.html)
## - 'A brief introduction to bibliometrix' (https://cran.r-project.org/web/packages/bibliometrix/vignettes/bibliometrix-vignette.html)
## 
## 
## Missing fields:  IDDone!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

# print(paste("There are",nrow(dt_one_j), "documents from this journal."))

print("Automated analysis from the bibliometrix package.")

## [1] "Automated analysis from the bibliometrix package."

results_one_j <- biblioAnalysis(dt_one_j, sep = ";")


# summary of the analysis (error with the function)
summary(object = results_one_j, k = 10, pause = FALSE)

## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              1994 : 2019 
##  Sources (Journals, Books, etc)        1 
##  Documents                             611 
##  Average years from publication        12.2 
##  Average citations per documents       9.142 
##  Average citations per year per doc    0.6864 
##  References                            22574 
##  
## DOCUMENT TYPES                     
##  article               495 
##  conference paper      34 
##  editorial             25 
##  erratum               4 
##  letter                2 
##  note                  14 
##  review                37 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    0 
##  Author's Keywords (DE)                1564 
##  
## AUTHORS
##  Authors                               507 
##  Author Appearances                    779 
##  Authors of single-authored documents  323 
##  Authors of multi-authored documents   184 
##  
## AUTHORS COLLABORATION
##  Single-authored documents             487 
##  Documents per Author                  1.21 
##  Authors per Document                  0.83 
##  Co-Authors per Documents              1.27 
##  Collaboration Index                   1.48 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1994       19
##     1995       16
##     1996       15
##     1997       16
##     1998       11
##     1999       21
##     2000       18
##     2001       29
##     2002       17
##     2003       28
##     2004       23
##     2005       29
##     2006       23
##     2007       25
##     2008       18
##     2009       25
##     2010       27
##     2011       26
##     2012       29
##     2013       34
##     2014       25
##     2015       31
##     2016       26
##     2017       22
##     2018       22
##     2019       36
## 
## Annual Percentage Growth Rate 2.589274 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1    DAVIS JB           14   DAVIS JB                       10.78
## 2    MKI U              11   MKI U                          10.33
## 3    MAYER T            10   HAUSMAN DM                      9.00
## 4    SUGDEN R           10   REISS J                         9.00
## 5    BACKHOUSE RE        9   MAYER T                         8.33
## 6    HAUSMAN DM          9   BACKHOUSE RE                    8.00
## 7    REISS J             9   NA NA                           8.00
## 8    ROSS D              9   ROSS D                          7.50
## 9    HOOVER KD           8   SUGDEN R                        6.67
## 10   NA NA               8   HOOVER KD                       6.50
## 
## 
## Top manuscripts per citations
## 
##                       Paper           TC TCperYear
## 1  SUGDEN R, 2000, J ECON METHODOL-a 170      8.10
## 2  HODGSON GM, 2007, J ECON METHODOL 164     11.71
## 3  WITT U, 2004, J ECON METHODOL     103      6.06
## 4  SCHRAM A, 2005, J ECON METHODOL    95      5.94
## 5  MKI U, 2005, J ECON METHODOL       87      5.44
## 6  MORGAN MS, 2001, J ECON METHODOL   85      4.25
## 7  MORGAN MS, 2005, J ECON METHODOL   78      4.88
## 8  CHICK V, 2005, J ECON METHODOL     76      4.75
## 9  WOODWARD J, 2006, J ECON METHODOL  74      4.93
## 10 READ D, 2005, J ECON METHODOL      71      4.44
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq SCP MCP MCP_Ratio
## 1  USA                  69 0.2738  66   3    0.0435
## 2  UNITED KINGDOM       52 0.2063  48   4    0.0769
## 3  NETHERLANDS          26 0.1032  22   4    0.1538
## 4  FRANCE               13 0.0516  13   0    0.0000
## 5  GERMANY              13 0.0516  12   1    0.0769
## 6  FINLAND              12 0.0476  10   2    0.1667
## 7  ITALY                10 0.0397   9   1    0.1000
## 8  SOUTH AFRICA          7 0.0278   4   3    0.4286
## 9  GEORGIA               6 0.0238   4   2    0.3333
## 10 CANADA                5 0.0198   5   0    0.0000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  NETHERLANDS                573                     22.04
## 2  UNITED KINGDOM             560                     10.77
## 3  USA                        503                      7.29
## 4  GERMANY                    233                     17.92
## 5  FINLAND                     89                      7.42
## 6  FRANCE                      77                      5.92
## 7  ITALY                       59                      5.90
## 8  SOUTH AFRICA                35                      5.00
## 9  GEORGIA                     34                      5.67
## 10 PORTUGAL                    30                     10.00
## 
## 
## Most Relevant Sources
## 
##                    Sources        Articles
## 1 JOURNAL OF ECONOMIC METHODOLOGY      611

# Multiple graphs:
plot(x = results_one_j, k = 10, pause = FALSE)

# print("Getting the most-cited references")
CR <- citations(dt_one_j, field = "article", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited references")

Most-cited references
HAUSMAN, D., (1992) THE INEXACT AND SEPARATE SCIENCE OF ECONOMICS, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	33
HAUSMAN, D.M., (1992) THE INEXACT AND SEPARATE SCIENCE OF ECONOMICS, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	33
LAWSON, T., (1997) ECONOMICS AND REALITY, , LONDON: ROUTLEDGE	26
HANDS, D.W., (2001) REFLECTION WITHOUT RULES: ECONOMIC METHODOLOGY AND CONTEMPORARY SCIENCE THEORY, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	23
KAHNEMAN, D., TVERSKY, A., PROSPECT THEORY: AN ANALYSIS OF DECISION UNDER RISK (1979) ECONOMETRICA, 47, PP. 263-291	15
BLAUG, M., (1980) THE METHODOLOGY OF ECONOMICS, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	13
HUTCHISON, T.W., (1938) THE SIGNIFICANCE AND BASIC POSTULATES OF ECONOMIC THEORY, , LONDON: MACMILLAN	12
KEYNES, J.M., (1936) THE GENERAL THEORY OF EMPLOYMENT, INTEREST AND MONEY, , LONDON: MACMILLAN	12
LAWSON, T., (2003) REORIENTING ECONOMICS, , LONDON: ROUTLEDGE	11
SMITH, V.L., MICROECONOMIC SYSTEMS AS AN EXPERIMENTAL SCIENCE (1982) AMERICAN ECONOMIC REVIEW, 72, PP. 923-955	11

# print("Getting the most-cited first author")
CR <- citations(dt_one_j, field = "author", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited first authors")

Most-cited first authors
MKI U	241
SUGDEN R	198
SEN A	193
KAHNEMAN D	170
FRIEDMAN M	164
TVERSKY A	151
HAYEK F A	141
LOEWENSTEIN G	129
GIGERENZER G	127
LAWSON T	127

rm(dt_one_j,results_one_j)

2.1.3 Both journals together

Now, we combine the two journals in a single corpus, keep only articles and reviews, and drop documents published prior to 1990.

#Loading corpus
df_CorpusA <- list(convert2df(file = "/projects/digital_history/philo_and_economics/data/BibTeX files/Corpus A - Core Phil Eco Journals and Books/Eco & Phil Vols 1-35 (1985-2019).bib",dbsource= "scopus", format = "bibtex"),
              convert2df(file = "/projects/digital_history/philo_and_economics/data/BibTeX files/Corpus A - Core Phil Eco Journals and Books/JEM Vols 1-26 (1994-2019).bib",dbsource= "scopus", format = "bibtex")) %>% rbindlist(fill = TRUE)

## 
## Converting your scopus collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
## 
## 
## Converting your scopus collection into a bibliographic dataframe
## 
## 
## Warning:
## In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!
## 
## Please, take a look at the vignettes:
## - 'Data Importing and Converting' (https://cran.r-project.org/web/packages/bibliometrix/vignettes/Data-Importing-and-Converting.html)
## - 'A brief introduction to bibliometrix' (https://cran.r-project.org/web/packages/bibliometrix/vignettes/bibliometrix-vignette.html)
## 
## 
## Missing fields:  IDDone!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

#We only take articles and reviews and published in 1990 or later
dt_CorpusArticles <- df_CorpusA[DT %in% c("ARTICLE", "REVIEW") & PY>=first_y]

print("Automated analysis from the bibliometrix package.")

## [1] "Automated analysis from the bibliometrix package."

results_cA <- biblioAnalysis(dt_CorpusArticles, sep = ";")


# summary of the analysis (error with the function)
summary(object = results_cA, k = 10, pause = FALSE)

## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              1990 : 2019 
##  Sources (Journals, Books, etc)        2 
##  Documents                             1007 
##  Average years from publication        13.3 
##  Average citations per documents       11.35 
##  Average citations per year per doc    0.7613 
##  References                            33760 
##  
## DOCUMENT TYPES                     
##  article      846 
##  review       161 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    45 
##  Author's Keywords (DE)                1749 
##  
## AUTHORS
##  Authors                               836 
##  Author Appearances                    1259 
##  Authors of single-authored documents  550 
##  Authors of multi-authored documents   286 
##  
## AUTHORS COLLABORATION
##  Single-authored documents             808 
##  Documents per Author                  1.2 
##  Authors per Document                  0.83 
##  Co-Authors per Documents              1.25 
##  Collaboration Index                   1.44 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1990       13
##     1991       16
##     1992       13
##     1993       17
##     1994       38
##     1995       32
##     1996       23
##     1997       29
##     1998       19
##     1999       32
##     2000       32
##     2001       38
##     2002       27
##     2003       34
##     2004       27
##     2005       44
##     2006       35
##     2007       35
##     2008       40
##     2009       41
##     2010       37
##     2011       36
##     2012       44
##     2013       46
##     2014       42
##     2015       45
##     2016       32
##     2017       30
##     2018       43
##     2019       67
## 
## Annual Percentage Growth Rate 5.817198 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1    SUGDEN R           20    HAUSMAN DM                    15.00
## 2    HAUSMAN DM         16    SUGDEN R                      14.17
## 3    GUALA F            10    MKI U                         10.00
## 4    MKI U              10    GUALA F                        9.00
## 5    DAVIS JB            9    REISS J                        8.00
## 6    MAYER T             9    ROSS D                         7.50
## 7    ROSS D              9    MAYER T                        7.33
## 8    GOLDFARB RS         8    HANDS DW                       7.00
## 9    REISS J             8    QIZILBASH M                    7.00
## 10   BACKHOUSE RE        7    DAVIS JB                       6.78
## 
## 
## Top manuscripts per citations
## 
##                       Paper           TC TCperYear
## 1  MORRIS S, 1995, ECON PHILOS       174      6.69
## 2  SUGDEN R, 2000, J ECON METHODOL-a 170      8.10
## 3  HODGSON GM, 2007, J ECON METHODOL 164     11.71
## 4  SUGDEN R, 2000, ECON PHILOS       157      7.48
## 5  BUCHANAN JM, 1991, ECON PHILOS    153      5.10
## 6  FLEURBAEY M, 1995, ECON PHILOS    137      5.27
## 7  NUSSBAUM MC, 2001, ECON PHILOS    136      6.80
## 8  DONALDSON T, 1995, ECON PHILOS    117      4.50
## 9  STALNAKER R, 1996, ECON PHILOS    111      4.44
## 10 CUBITT RP, 2003, ECON PHILOS       96      5.33
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq SCP MCP MCP_Ratio
## 1  USA                 113 0.3343 109   4    0.0354
## 2  UNITED KINGDOM       63 0.1864  59   4    0.0635
## 3  NETHERLANDS          27 0.0799  24   3    0.1111
## 4  FRANCE               19 0.0562  19   0    0.0000
## 5  GERMANY              18 0.0533  17   1    0.0556
## 6  ITALY                11 0.0325  10   1    0.0909
## 7  FINLAND              10 0.0296   9   1    0.1000
## 8  CANADA                9 0.0266   9   0    0.0000
## 9  SWEDEN                7 0.0207   6   1    0.1429
## 10 BELGIUM               6 0.0178   5   1    0.1667
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                       1218                     10.78
## 2  UNITED KINGDOM             686                     10.89
## 3  NETHERLANDS                493                     18.26
## 4  SWEDEN                     185                     26.43
## 5  GERMANY                    159                      8.83
## 6  FRANCE                     113                      5.95
## 7  NORWAY                      93                     23.25
## 8  FINLAND                     86                      8.60
## 9  ITALY                       59                      5.36
## 10 BELGIUM                     51                      8.50
## 
## 
## Most Relevant Sources
## 
##                    Sources        Articles
## 1 JOURNAL OF ECONOMIC METHODOLOGY      532
## 2 ECONOMICS AND PHILOSOPHY             475
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1     METHODOLOGY                  48  ABORTION                     6
## 2     ECONOMIC METHODOLOGY         21  BEHAVIOR                     6
## 3     RATIONALITY                  20  FAMILY PLANNING              6
## 4     MODELS                       18  ECONOMICS                    5
## 5     EXPERIMENTAL ECONOMICS       16  CRITIQUE                     4
## 6     EXPLANATION                  16  ECONOMIC FACTORS             4
## 7     EXPERIMENTS                  13  FERTILITY CONTROL            4
## 8     BEHAVIORAL ECONOMICS         12  INDUCED                      4
## 9     ECONOMICS                    12  POSTCONCEPTION               4
## 10    GAME THEORY                  12  PSYCHOLOGICAL FACTORS        4

# Multiple graphs:
plot(x = results_cA, k = 10, pause = FALSE)

# print("Getting the most-cited references")
CR <- citations(dt_CorpusArticles, field = "article", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited references")

Most-cited references
HAUSMAN, D., (1992) THE INEXACT AND SEPARATE SCIENCE OF ECONOMICS, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	32
HAUSMAN, D.M., (1992) THE INEXACT AND SEPARATE SCIENCE OF ECONOMICS, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	32
KAHNEMAN, D., TVERSKY, A., PROSPECT THEORY: AN ANALYSIS OF DECISION UNDER RISK (1979) ECONOMETRICA, 47, PP. 263-291	24
RAWLS, J., (1971) A THEORY OF JUSTICE, , HARVARD UNIVERSITY PRESS	24
HANDS, D.W., (2001) REFLECTION WITHOUT RULES: ECONOMIC METHODOLOGY AND CONTEMPORARY SCIENCE THEORY, , CAMBRIDGE: CAMBRIDGE UNIVERSITY PRESS	22
RAWLS, J., (1971) A THEORY OF JUSTICE, , CAMBRIDGE, MA: HARVARD UNIVERSITY PRESS	22
LAWSON, T., (1997) ECONOMICS AND REALITY, , LONDON: ROUTLEDGE	21
BROOME, J., (2004) WEIGHING LIVES, , OXFORD: OXFORD UNIVERSITY PRESS	15
CAMERER, C., LOEWENSTEIN, G., PRELEC, D., NEUROECONOMICS: HOW NEUROSCIENCE CAN INFORM ECONOMICS (2005) JOURNAL OF ECONOMIC LITERATURE, 43, PP. 9-64	15
COHEN, G.A., ON THE CURRENCY OF EGALITARIAN JUSTICE (1989) ETHICS, 99, PP. 906-944	15

# print("Getting the most-cited first author")
CR <- citations(dt_CorpusArticles, field = "author", sep = ";")
kable(cbind(CR$Cited[1:10]), caption = "Most-cited first authors")

Most-cited first authors
SEN A	476
SUGDEN R	358
KAHNEMAN D	289
MKI U	255
TVERSKY A	236
HAYEK F A	217
FRIEDMAN M	210
BROOME J	185
SEN A K	181
LOEWENSTEIN G	169

dt_CorpusArticles$ID <- 1:nrow(dt_CorpusArticles)


save(dt_CorpusArticles,file = "/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")

rm(df_CorpusA)

2.1.4 Cleaning up references

The descriptive results above show that some cleaning is necessary on the references. For instance, Hausman shows up as 'HAUSMAN, D.', but also as 'HAUSMAN, D.M.' The following code chunk is far from being optimized for speed. It was run once and its output was saved.

dt_refs <-  data.table(ID=numeric(),order= numeric(), refs = character())

for(i in unique(dt_CorpusArticles$ID)){
  
  if (dt_CorpusArticles[ID==i]$CR != "" & !is.na(dt_CorpusArticles[ID==i]$CR))
  {
    refs <-  dt_CorpusArticles[ID==i]$CR %>%  strsplit("; ") %>%  unlist()
  
    dt_refs <- rbind(dt_refs,data.table(ID=i,order= 1:length(refs), refs = refs))
  }
  
}


dt_refs[,Year :=  str_extract(refs, '(?<=\\()[0-9-]+(?=\\))')]
#Adding info from citing doc (journal + citing year)
dt_refs <- merge(dt_refs, dt_CorpusArticles[,list(ID, SO, PY)], by="ID", all.x = TRUE)

#Finding duplicates with authors name
dt_good_format_refs <- dt_refs[!is.na(Year) & str_length(Year) == 4]
refs_splits <- strsplit(dt_good_format_refs$refs, ",")
dt_good_format_refs$Author <- lapply(refs_splits, function(l) l[[1]])
dt_good_format_refs$Author <- as.character(dt_good_format_refs$Author) #There are 8212 unique authors

setkey(dt_good_format_refs, Author)
dt_before <- dt_good_format_refs # to look what sources were changed by the algorithm
h = 0; old_perc =0 #to keep track of where we're at in the loop
n_aut = length(unique(dt_good_format_refs$Author))
for (author in unique(dt_good_format_refs$Author))
{
  h = h + 1
  for (i in c(1:nrow(dt_good_format_refs[Author == author])))
  {
    for (j in c(i+1:nrow(dt_good_format_refs[Author == author])))
    {
      dt <- duplicatedMatching(dt_good_format_refs[Author == author][c(i,j)], Field = "refs", tol = 0.85)
      if (nrow(dt) == 1)
      {
        dt_good_format_refs[Author == author][c(i,j)]$refs <- dt$refs
      }
    }
  }
  perc = round(h/n_aut*100) # conditional print function to keep track of each step in the percentage
  if(perc > old_perc){  
    print(paste(round(h/n_aut*100), "% of authors completed at", Sys.time()))
    old_perc = perc
  }
}
#looking at what was changed
dt_difference <- merge(dt_good_format_refs[,list(ID, order, refs)], dt_before[,list(ID, order, refs)], by = c("ID", "order"))
dt_refs <- dt_good_format_refs
save(dt_refs, file = "/projects/digital_history/philo_arefnd_economics/data/dt_refs.RData")
rm(dt_refs,dt_good_format_refs,dt_before)

source("/home/olivier/my_projects/central_banks/functions/functions_for_bibliometrics.R")
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")

#The previous chunk of code got rid of duplicates, but we still have references from different editions.
#This is problematic because when we do the top 5 cited refs, we sometime have twice the same source but from different editions.
#Our goal here is to create some kind of unique identifier for references.
#To do so we calculate the levenshtein distance between the first 50 characters of references from the same author
#The title of a reference is usually within those first 50 characters, and the edition comes after
#If the distance is less than 10, we consider it must be the same reference

if(!"ref_id" %in% colnames(dt_refs))
{
  #We give an ID to references
  identifying_refs <- dt_refs[,list(refs)] %>% unique()
  identifying_refs[,ref_id := c(1:.N)]
  dt_refs <- merge(dt_refs, identifying_refs, by = "refs", all.x = TRUE)
}


#We don't want to run this twice if it was already done
if (!"unique_ref_id" %in% colnames(dt_refs))
{
  lv_distance_trhld <- 10
  size_of_ref_chunk <- 50
  #Since the goal of the procedure is to get top citations, we only try to find duplicates for the top 500
  #authors 
  top_authors <- dt_refs[, .N, by = Author][order(-N)] %>% head(500)
  
  #First we need ids for references
  ref_ids <- dt_refs[,list(refs)] %>% unique()
  setkey(ref_ids, refs)
  ref_ids[,ref_id := c(1:.N)]
  
  #Then we only take references from the top 500 authors
  dt <- dt_refs[Author %in% top_authors$Author, list(Author, refs)] %>% unique()
  dt <- merge(dt, ref_ids, by = "refs") #merging ref_id
  
  #Creating all combinations of references per author
  dt_refs_duplicates <- dt[,list(Target = rep(refs[1:(length(refs)-1)],(length(refs)-1):1),
                                               Source = rev(refs)[sequence((length(refs)-1):1)]),
                                         by= Author]
  
  #Merging back ids
  dt_refs_duplicates <- merge(dt_refs_duplicates, dt[,list(refs, ref_id)], by.x = "Target", by.y="refs")
  setnames(dt_refs_duplicates, "ref_id", "Target_ID")
  dt_refs_duplicates <- merge(dt_refs_duplicates, dt[,list(refs, ref_id)], by.x = "Source", by.y="refs")
  setnames(dt_refs_duplicates, "ref_id", "Source_ID")
  
  #Cleaning the references to get rid of special characters and numbers (we don't want the year to matter, only the title)
  dt_refs_duplicates[,`:=`(Target = fct_clear_string_for_fuzzy_match(Target) %>% removeNumbers(),
                           Source = fct_clear_string_for_fuzzy_match(Source) %>% removeNumbers())]
  
  dt_refs_duplicates[, size_of_ref_chunk := ((str_length(Source) + str_length(Target)) / 4) %>% round()]
  dt_refs_duplicates[, lv_distance_trhld := (size_of_ref_chunk * 0.1) %>% round()]
  #Calculating levenshtein distance
  dt_refs_duplicates[, distance := stringdist::stringdist(substr(Target, 1, size_of_ref_chunk), 
                                                          substr(Source, 1, size_of_ref_chunk))]
  
  #Only keeping couples of refs with a distance of less than 10
  dt_refs_ids <- dt_refs_duplicates[distance < lv_distance_trhld]
  
  #Sorting in order to have ref_ids that are only in the Source column
  dt_refs_ids[, `:=`(Target = c(Target,Source) %>% sort() %>%  first(), Source = c(Target,Source) %>% sort() %>% last())
                                                 ,by = .(Target,Source)]
  
  #Taking only the rows of the sources_ID that are not in the Target column
  dt_refs_ids <- dt_refs_ids[!Source_ID %in% Target_ID, list(ID = Source_ID, Source_ID, Target_ID)]
  
  #Binding the Source and the Target ids together
  dt_refs_ids <- rbind(dt_refs_ids[,list(ref_id = Target_ID, unique_ref_id = ID)], 
                       dt_refs_ids[,list(ref_id = Source_ID, unique_ref_id = ID)]) %>% unique()
  
  #In some rare cases, some ref_ids have more than one unique_ref_id. We take the unique_ref_id that is the most frequent
  dt_refs_ids[,N := .N, by = unique_ref_id]
  setorder(dt_refs_ids, -N)
  dt_refs_ids <- dt_refs_ids[,head(.SD, 1), by = ref_id]
  
  #We merge back the unique_ref_ids on the dt_refs table
  #dt_refs <- merge(dt_refs, ref_ids, by = "refs")
  dt_refs <- merge(dt_refs, dt_refs_ids[,list(ref_id, unique_ref_id)], by = "ref_id", all.x = TRUE)
  
  #Taking care of reference that don't have unique_ref_ids
  dt_refs[is.na(unique_ref_id), unique_ref_id := ref_id]
  
  save(dt_refs, file = "/projects/digital_history/philo_and_economics/data/dt_refs.RData")
  
  rm(dt_refs_duplicates); rm(dt_refs_ids)
}

rm(dt_refs)

2.2 JEL Code 'Economic Methodology'

This corpus is composed of articles deemed "economic methodology" among economists, but not published in the main journals of specialized philosophy of economics. Creating the corpus involves a few steps:

#Loading xml file
data <- xml2::read_xml("/projects/digital_history/philo_and_economics/data/2020-09-21_JEL-Methodology.xml")
#This line create a kind of list of all entries
data <- xml2::xml_find_all(data, "//rec")
#Extracting data
title <- xml2::xml_text(xml2::xml_find_first(data, ".//atl"))
year <- data.table(xml2::xml_text(xml2::xml_find_first(data, ".//dt")))
year[,V1 := substr(V1,1,4)]
journal <- xml2::xml_text(xml2::xml_find_first(data, ".//jtl"))
vol <- xml2::xml_text(xml2::xml_find_first(data, ".//vid"))
no <- xml2::xml_text(xml2::xml_find_first(data, ".//iid"))
pubType <- xml2::xml_text(xml2::xml_find_first(data, ".//pubtype"))
pages <- xml2::xml_text(xml2::xml_find_first(data, ".//pages"))


dt_JEL <- data.table(Title = title, Year = year$V1, Journal = journal, Vol = vol, No = no, Pages = pages, PubType = pubType)

# get all revues!
dt_revue <- fread("/projects/digital_history/interdisciplinarity/data/revues.txt") 

# names of journals from JEL
dt_JEL[,Journal := toupper(Journal)]

# merge
# creating cleaner strings to merge
dt_JEL[,match_field := Journal %>% str_replace_all(" AND ", " & ") %>% 
         str_replace_all("[:punct:] "," ") %>% str_replace_all("  "," ")]
dt_revue[,match_field := Revue %>% str_replace_all(" AND ", " & ") %>% 
         str_replace_all("[:punct:] "," ") %>% str_replace_all("  "," ")]
dt_JEL <- merge(dt_JEL[,], dt_revue, by= "match_field", all.x=TRUE, all.y=FALSE) # actual merge


# Manual fix for journal mismatches
j_no_match <- dt_JEL[PubType == "Journal Article"  & is.na(Code_Revue),.N,by=Journal][N>4][order(-N)]

dt_JEL[grepl("JOURNAL OF ECONOMIC ISSUES",Journal), Code_Revue := 8600]
dt_JEL[Journal == "JOURNAL OF INSTITUTIONAL AND THEORETICAL ECONOMICS", Code_Revue := 9055]
dt_JEL[Journal == "JOURNAL OF THE HISTORY OF ECONOMIC THOUGHT (CAMBRIDGE UNIVERSITY PRESS)", Code_Revue := 18851]
dt_JEL[Journal == "REVUE D'ECONOMIE POLITIQUE", Code_Revue := 14216]
dt_JEL[grepl("RECHERCHES ECONOMIQUES DE LOUVAIN",Journal), Code_Revue := 13831]
dt_JEL[grepl("ZEITSCHRIFT FUR WIRTSCHAFTS-[ ]?UND SOZIALWISSENSCHAFTEN",Journal), Code_Revue := 18816]
dt_JEL[grepl("CANADIAN JOURNAL OF DEVELOPMENT STUDIES",Journal), Code_Revue := 2850]
dt_JEL[grepl("OXFORD ECONOMIC PAPERS",Journal), Code_Revue := 12488]
dt_JEL[Journal == "ECONOMICS: THE OPEN-ACCESS, OPEN-ASSESSMENT E-JOURNAL", Code_Revue := 19904]
dt_JEL[grepl("CANADIAN JOURNAL OF AGRICULTURAL ECONOMICS",Journal), Code_Revue := 2832]

dt_JEL$Code_Discipline <- NULL
dt_JEL <- merge(dt_JEL, dt_revue[,list(Code_Revue, Code_Discipline)], 
                by= "Code_Revue", all.x=TRUE, all.y=FALSE) # actual merge



# tools used to diagnose mismatches
# for(j in j_no_match$Journal){ 
#   if(!j %in% c("METHODUS", "OECONOMICA", "OECONOMIA", "EKONOMIA","INNOVATIONS")){ # skipping a few that generate many false positive
#      hit <-   dt_revue[agrepl(j,Revue)]
#      if(nrow(hit)>0){
#     print(paste(j, "is detected as similar enough to", hit$Revue))
#   }
#   }
# }
# 
# txt <- "JOURNAL OF BEHAVIOURAL ECONOMICS"
# dt_JEL[grepl(txt, Journal) & PubType == "Journal Article"  & is.na(Code_Revue)]
# dt_revue[grepl(txt,Revue)]

# to check if journals are on WoS: https://mjl.clarivate.com/home

dt_JEL_all <- dt_JEL
dt_JEL_all$match_field <- NULL
dt_JEL_all  <- dt_JEL_all %>% unique()
save(dt_JEL_all, file = "/projects/digital_history/philo_and_economics/data/dt_JEL_all.RData" )

# Writing small file with 
write(paste( dt_JEL_all[!is.na(Code_Revue),unique(Code_Revue)],
             collapse = ", "), 
      "/projects/digital_history/philo_and_economics/data/journals_w_JEL_methodo.txt") 

# getting some basic info
n_docs <- nrow(dt_JEL_all)
n_art <- dt_JEL_all[PubType == "Journal Article" & Year >= first_y] %>% nrow()
n_art_w_j <- dt_JEL_all[PubType == "Journal Article" & !is.na(Code_Revue) & Year >= first_y] %>% nrow()
n_j <- dt_JEL_all[!is.na(Code_Revue)  & Year >= first_y,unique(Code_Revue)] %>% length()
prop_missing_j <- dt_JEL[PubType == "Journal Article" & is.na(Code_Revue)  & Year >= first_y,.N]/ dt_JEL[PubType == "Journal Article"  & Year >= first_y,.N]



rm(data,dt_JEL_all, dt_JEL,dt_revue,year,pages,vol,pubType,no,journal)

In the EconLit database, we retrieve all articles with the JEL code corresponding to 'Economic Methodology'. More specifically, the search string used is SU "Economic Methodology". After 1991, the corresponding JEL code is B4, but using B4 or (B4*) as a JEL code results in too few articles being retrieved for 1990 and 1991 because the JEL classification system was transitionning from older codes (where 'Economic Methodology' was code 00360). Our extraction from EconLit on September 21st 2020 gave us 9166 documents.
Among these documents, we take those that are journal articles and focus on documents published since 1990, which brings us down to 4081 documents.
We match the journal names to list of journals in Web of Science. Some manual corrections were necessary at this step because spelling of journals sometimes differed between EconLit and Web of Science. Given that Web of Science includes only a subset of academic journals, only 2586 articles in our corpus can be matched to their corresponding journal in Web of Science.

#This data was parsed above
load("/projects/digital_history/philo_and_economics/data/dt_JEL_all.RData" )

dt_JEL <- dt_JEL_all[PubType == "Journal Article" & !is.na(Code_Revue) &  Year >= first_y]
dt_JEL[,c("First_Page", "Last_Page") := tstrsplit(Pages, "-", fixed=TRUE)]

# Loading journal issues (retrieved from WoS after previous step)
dt_revueID <- fread("/projects/digital_history/philo_and_economics/data/JEL_meth_issueIDs.tsv")
#  data.table(read.csv2("/projects/digital_history/behavioral\ economics/data/revueID.csv"))

#We put NAs to NULL so we can try to match all entries with our database
dt_JEL[is.na(Vol), Vol := "NULL"]
dt_JEL[is.na(No), No := "NULL"]
dt_JEL$Year <- as.numeric(dt_JEL$Year)

#ECONOMICS-THE OPEN ACCESS OPEN-ASSESSMENT E-JOURNAL ECONOMICS-KIEL (Code Revue 19904) doesn't have volume and Numero
dt_Economics_Kiel <- dt_JEL[Code_Revue == 19904]
#Manually attributing id_art to the five articles found in WoS
dt_Economics_Kiel[order(Year,Vol,Title), ID_Art := c(NA, 47615209, 47615210, 47711327, 54947652, 56295156)]
# Removing the journal before processing to avoid issues.
dt_JEL <- dt_JEL[Code_Revue != 19904]


#Journal number 7495 has weird number nomenclature
dt_JEL[Code_Revue == 7495 & No == "4/5", No := "4-5"]
dt_JEL[Code_Revue == 7495 & No == "7/8", No := "7-8"]
dt_JEL[Code_Revue == 7495 & No == "3/4/5", No := "3-5"]
dt_JEL[Code_Revue == 7495 & No == "7/8/9", No := "7-9"]
dt_JEL[Code_Revue == 7495 & No == "9-10-11", No := "9-11"]
dt_JEL[Code_Revue == 7495 & No == "10-11-12", No := "10-12"]

# Journal 352 has issue filled instead of volume in WoS
dt_JEL[Code_Revue == 352 , `:=`(No = Vol, Vol = "NULL")]


#Merging journal unique identifier (IssueID)
dt_JEL <- merge(dt_JEL, dt_revueID[,list(Code_Revue, IssueID, Volume, Numero, Annee_Bibliographique)], 
                by.x = c("Code_Revue", "Vol", "No", "Year"), by.y = c("Code_Revue", "Volume", "Numero", "Annee_Bibliographique"), 
                all.x = TRUE, all.y = FALSE)

# Making sure that our specialized journals are excluded:
dt_JEL <- dt_JEL[Journal != "ECONOMICS AND PHILOSOPHY" & 
                   Journal != "JOURNAL OF ECONOMIC METHODOLOGY"]

# We have quite a few articles with matching journal but no matching issue
n_wo_issueID <- dt_JEL[is.na(IssueID), .N] 

# Here's some code to test what these articles without matches are
# We found out that it is mostly because WoS did not index these journals for the relevant issues
# (j <- dt_JEL[is.na(IssueID),list(first(Journal), .N),by= Code_Revue][order(-N)] %>% head(10))

# for(j_i in dt_JEL[is.na(IssueID),list(first(Journal), .N),by= Code_Revue][order(-N)]$Code_Revue){
# if(any(dt_JEL[is.na(IssueID) & Code_Revue %in% j_i,
#               unique(Year)] 
#        %in% dt_revueID[Code_Revue %in% j_i,
#                        unique(Annee_Bibliographique)]
# )){
#   print(paste("Possibly something to improve with journal", dt_JEL[Code_Revue == j_i, unique(Journal)], " which is Code_Revue = ", j_i))
# }
# }
# 
# j_i <-8601 # Checking missing Journal of Economic Literature
# (dt_temp <- dt_JEL[is.na(IssueID) & Code_Revue %in% j_i, 
#        list(Code_Revue,Journal, IssueID, Year, Vol,No, First_Page,Last_Page)][order(Year)]
#   )
# dt_revueID[Code_Revue %in% j_i & Annee_Bibliographique %in% unique(dt_temp$Year), 
#            list(Code_Revue,Revue, IssueID, Annee_Bibliographique, Volume,Numero)][
#   order(Annee_Bibliographique)]

# Shrinking the table to only lines with issueID
dt_JEL <- dt_JEL[!is.na(IssueID)]

# Saving a small file to be able to match in WoS at the article level
fwrite(dt_JEL[,list(IssueID,First_Page,Code_Revue)],
       file = "/projects/digital_history/philo_and_economics/data/articles_in_JEL_corpus_to_match_w_WoS.csv")

save(dt_JEL, file = "/projects/digital_history/philo_and_economics/data/dt_JEL_with_issueID.RData" )


rm(dt_JEL,dt_revueID, dt_JEL_all)

load("/projects/digital_history/philo_and_economics/data/dt_JEL_with_issueID.RData")

#Fetching Article table
dt_Articles <- fread("/projects/digital_history/philo_and_economics/data/2020-09-28_id_art_of_JEL_econ_methodo.tsv",
                     quote="")

dt_cleaned_JEL <- merge(dt_JEL, dt_Articles, by = c("IssueID", "First_Page", "Code_Revue"), all.x = TRUE)

n_art_no_idart <- dt_cleaned_JEL[is.na(ID_Art),.N]

dt_cleaned_JEL <- dt_cleaned_JEL[! is.na(ID_Art)]

# Bringing the KIEL articles back in the dataframe
dt_cleaned_JEL <- rbindlist(list(dt_cleaned_JEL,dt_Economics_Kiel[!is.na(ID_Art)]),fill = T)

# Loading the references (that were fetched on WoS with Sql  and then improved upon with New_id2 in the R script 'script_only_for_JEL-WoS-refs.R')
load("/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho_FULL.RData")

# Keeping only refs with identifier
dt_ref <- dt_ref[!is.na(New_id2)]

# table with primary key being New_id2 (cited doc unique info)
dt_refs_of_JEL_metho <- dt_ref[,list(First_author = first(Nom), cited_year = first(Annee),
                                  Publication = first(Revue_Abbrege), Volume = first(Volume), Page = first(Page),
                                  last_name = first(last_name), times_cited = .N
                                  ),by=New_id2]
# Simple citing-cited table
dt_citing_cited <- unique(dt_ref[,list(ID_Art, New_id2)])

# Articles with refs:
dt_Articles <- dt_cleaned_JEL[ID_Art %in% unique(dt_ref$ID_Art)]

n_art_w_refs <- nrow(dt_Articles)
#Our code doesn't work if a ref is cited in only one article, or if an article has only one reference.
#dt_ref <- dt_ref[,N := .N, by = "ID_Art"][N > 4]
#dt_ref <- dt_ref[,N := .N, by = "New_id2"][N > 4]


# Cleaning journal names:
 j <- "JOURNAL OF ECONOMIC ISSUES"
dt_Articles[grepl(j,Journal), Journal := j]

save(dt_refs_of_JEL_metho, file = "/projects/digital_history/philo_and_economics/data/dt_refs_of_JEL_metho.RData")
save(dt_citing_cited, file = "/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho.RData")
save(dt_Articles, file = "/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
rm(dt_ref); rm(dt_Articles); rm(dt_JEL)
rm(dt_refs_of_JEL_metho,dt_citing_cited, dt_cleaned_JEL)

We then match at the level of articles and keep only the articles with references. These steps imply that the corpus shrinks again, reaching 1465 documents (mostly because some journals are not indexed in Web of Science over their full lifetime). Here is the temporal distribution of the articles in this corpus

load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho.RData")
ggplot(dt_Articles[Year>=first_y],aes(x=Year)) + geom_bar() + ggtitle("Number of articles in the JEL 'Methodology' corpus")

# dt_Articles[Year>=first_y,.N,by=Year][order(Year)] 
# dt_Articles[between(Year,first_y,2018),.N,by=Year][, mean(N)]

n_2019 <- dt_Articles[Year == 2019, .N]
average1990_2018 <- dt_Articles[between(Year, first_y,last_y_metho), .N,Year][,mean(N)]
n_final <- dt_Articles[between(Year, first_y,last_y_metho), .N]
n_refs_JEL <- nrow(dt_citing_cited[ID_Art %in% dt_Articles[between(Year,first_y,last_y_metho), ID_Art]])

n_journals_JEL <- dt_Articles[between(Year, first_y,last_y_metho),unique(Code_Revue)] %>% length()

rm(dt_Articles, dt_citing_cited)

From this distribution, we see that the number of articles in 2019 is extremely low (3 compared to an average of 50 from 1990 to 2018). This must be attributable to indexing delays. We thus drop 2019 for this corpus. We are now left with 1462 documents having a total of 63267 references. These documents come from 165 distinct journals.

2.2.1 Further information on JEL corpus

Although EconLit "includes the most sought-after economics publications", some journals that are included do not fall neatly in economics as a discipline. Given that our WoS database has a disciplinary classification for each journal, we can compute the share of each discipline and of each journal in our corpus:

load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
 
# Name of disciplines with articles:
dt_Articles <-  merge(dt_Articles[between(Year, first_y, last_y_metho)], 
        discipline_info[,list(Code_Discipline,Discipline = str_replace(discipline," \n |\n"," "))], by= "Code_Discipline")

# breakdown by disciplines over the full period
top_citing_disc <-   dt_Articles[,list(nb_cit =.N),by=Discipline]
top_citing_disc[order(-nb_cit), list(Discipline, `Share of citations` = round(nb_cit / sum(nb_cit),3) )][1:10] %>% 
  kable(caption = "Share of articles for the JEL Methodology corpus")

Share of articles for the JEL Methodology corpus
Discipline	Share of citations
Economics	0.775
Other Social Sciences	0.078
Humanities	0.041
Political Science & Public Administration	0.027
Management	0.024
International Relations	0.023
Geography	0.019
Psychology	0.004
Law	0.004
Demography	0.002

prop_economics <-  top_citing_disc[Discipline == "Economics",
                                   nb_cit]/sum(top_citing_disc$nb_cit)

# breakdown by journals
top_citing_journals <-   dt_Articles[,list(Discipline = first(Discipline),
                                           Journal = first(Journal), nb_cit =.N),by=Code_Revue]
top_citing_journals <- top_citing_journals[order(-nb_cit)]
top_citing_journals[, list(Discipline, Journal, `Share of citations` = round(nb_cit / sum(nb_cit),3) )][1:12] %>% 
  kable(caption = "Share of articles for the JEL Methodology corpus")

Share of articles for the JEL Methodology corpus
Discipline	Journal	Share of citations
Economics	CAMBRIDGE JOURNAL OF ECONOMICS	0.131
Economics	JOURNAL OF ECONOMIC ISSUES	0.067
Economics	HISTORY OF POLITICAL ECONOMY	0.065
Economics	JOURNAL OF POST KEYNESIAN ECONOMICS	0.038
Economics	JOURNAL OF ECONOMIC BEHAVIOR AND ORGANIZATION	0.034
Economics	AMERICAN JOURNAL OF ECONOMICS & SOCIOLOGY	0.032
Economics	JOURNAL OF INSTITUTIONAL ECONOMICS	0.032
Other Social Sciences	CRITICAL REVIEW	0.030
Economics	JOURNAL OF INSTITUTIONAL AND THEORETICAL ECONOMICS	0.029
Economics	REVIEW OF SOCIAL ECONOMY	0.027
Other Social Sciences	SCIENCE AND SOCIETY	0.025
Economics	EUROPEAN JOURNAL OF THE HISTORY OF ECONOMIC THOUGHT	0.023

biggest_j_not_in_econ <- top_citing_journals[Discipline != "Economics", first(Journal)]
rank_biggest_j_not_in_econ <- which(top_citing_journals$Journal == biggest_j_not_in_econ)
 
rm(dt_Articles,top_citing_disc,top_citing_journals)

It is noteworthy that disciplines other than economics have together 22.5% of the articles and that a journal such as Critical Review is at rank 8.

2.3 Comparing size of the two corpora through time

# Loading the specialized philo of econ corpus
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
art_spec_phi_econ <- dt_CorpusArticles[SO %in% c("ECONOMICS AND PHILOSOPHY", "JOURNAL OF ECONOMIC METHODOLOGY")]
rm(dt_CorpusArticles)
 
# Loading the JEL corpus
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
 load("/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho.RData")
setkey(dt_Articles,ID_Art)
art_JEL_corpus <- dt_Articles[J(unique(dt_citing_cited$ID_Art))][ between(Year, first_y,last_y_metho) ]
rm(dt_citing_cited, dt_Articles)

# Combining the two corpora
art_combo <- rbind(art_spec_phi_econ[,list(Corpus = "Specialized Philosophy\nof Economics", Year = PY)],
      art_JEL_corpus[,list(Corpus = "JEL Economic Methodology", Year)]
)

art_combo_n <- art_combo[Year >=1990,list(n= .N), by= .(Year,Corpus)]
 
ggplot(art_combo_n,aes(x=Year, y=n,color = Corpus)) + geom_line(lwd=1.5) + ylim(0,max(art_combo_n$n, na.rm = TRUE)) + ylab("Number of articles")

rm(art_spec_phi_econ,art_JEL_corpus,art_combo, art_combo_n)

2.4 Comparing the most cited documents in each corpus

One way to see how far apart are the two corpora is to simply compare the citation ranks:

# Philo of econ
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")
dt_refs <- dt_refs[SO %in% c("ECONOMICS AND PHILOSOPHY", "JOURNAL OF ECONOMIC METHODOLOGY")]
# aggregating it:
refs_phi <- dt_refs[,list(n_philo = .N, First_surname = first(Author), 
                          cited_year = first(Year), refs_philo = first(refs)), 
                    by = unique_ref_id][order(-n_philo)]  
setnames(refs_phi,"unique_ref_id","ID_phi")
rm(dt_refs)
# Correcting a few things:
refs_phi[grepl("KUHN",x = First_surname),cited_year := c(1970,cited_year[2:.N])]
refs_phi[grepl("ROBBINS",x = First_surname),cited_year := c(1935,cited_year[2:.N])]
refs_phi[grepl("MARSHALL",x = First_surname),cited_year := c(1920,cited_year[2:.N])]
refs_phi[grepl("MCCLOSKEY",x = First_surname),cited_year := c(1998,cited_year[2:.N])]
refs_phi[grepl("BHASKAR",x = First_surname),cited_year := c(1978,cited_year[2:.N])]
refs_phi[grepl("BLAUG",x = First_surname),cited_year := c(1992,cited_year[2:.N])]
refs_phi[grepl("HUME",x = First_surname),cited_year := c(1978,cited_year[2:.N])]
refs_phi[grepl("SCHUMPETER",x = First_surname) &
          grepl("THEORY OF ECONOMIC DEVELOPMENT",x = refs_philo) ,
         cited_year := c(1934,cited_year[2:.N])]
refs_phi[grepl("POPPER",x = First_surname) &
          grepl("LOGIC",x = refs_philo) ,
         `:=`(cited_year = c(rep(1968,3),cited_year[4:.N]),
              ID_phi = c(rep(ID_phi[1],3),ID_phi[4:.N] ))]
refs_phi[grepl("SCHUMPETER",x = First_surname) &
          grepl("CAPITALISM",x = refs_philo),
         cited_year := c(1950)]
refs_phi[grepl("VEBLEN",x = First_surname) &
          grepl("LEISURE",x = refs_philo),
         cited_year := c(1994)]
refs_phi[grepl("SAYER",x = First_surname),cited_year := c(1992,cited_year[2:.N])]
refs_phi[grepl("JEVONS",x = First_surname),cited_year := c(1957,cited_year[2:.N])]
refs_phi[grepl("ROSENBERG",x = First_surname),cited_year := c(1992,cited_year[2:.N])]
refs_phi[grepl("FRIEDMAN",x = First_surname) &
          grepl("POSITIVE EC",x = refs_philo) &
          cited_year == 1953,
              ID_phi := ID_phi[1]]
refs_phi[grepl("MARX",x = First_surname)  &
          grepl("CAPITAL",x = refs_philo),
         `:=`(cited_year = 1970,
              ID_phi = ID_phi[1])]#,cited_year := c(1992,cited_year[2:.N])]



refs_phi[,n_philo := sum(n_philo), by =ID_phi]
refs_phi <- refs_phi[order(-n_philo),head(.SD,1), by =ID_phi]


# JEL methodology
load( file = "/projects/digital_history/philo_and_economics/data/dt_refs_of_JEL_metho.RData")

setnames(dt_refs_of_JEL_metho, c("last_name", "times_cited"), c("First_surname","n_metho"))
dt_refs_of_JEL_metho <- dt_refs_of_JEL_metho[order(-n_metho), list(ID_metho = New_id2, 
                                           n_metho, First_surname, cited_year, 
                                           publication_metho = Publication, vol_metho = Volume,
                                           page_metho = Page)]

# a bit of cleaning
dt_refs_of_JEL_metho[grepl("FRIEDMAN",x = First_surname) & cited_year == 1953, ID_metho := ID_metho[1]]
dt_refs_of_JEL_metho[grepl("SMITH",x = First_surname) & cited_year == 1776, ID_metho := ID_metho[1]]
dt_refs_of_JEL_metho[, n_metho := sum(n_metho), by = ID_metho]
dt_refs_of_JEL_metho <- dt_refs_of_JEL_metho[order(-n_metho), head(.SD,1), by = ID_metho]

# merging the two:
refs_combo <- merge(refs_phi,dt_refs_of_JEL_metho, by = c("First_surname", "cited_year"),all=TRUE)
refs_combo <- refs_combo[order(-n_philo,-n_metho)]
refs_combo[, dist := adist(publication_metho,refs_philo), by= .(ID_metho, ID_phi)]
setcolorder(refs_combo,c("n_philo", "n_metho", "dist", "First_surname", "cited_year","publication_metho", "refs_philo"))
refs_combo <- refs_combo[order(-n_metho,-n_philo,dist)]
max_id <- max(refs_combo$ID_metho,na.rm = TRUE)
refs_combo[is.na(ID_metho), `:=`(ID_metho = (max_id + (1:.N)), n_metho = 0)]
refs_combo <- refs_combo[,head(.SD,1),by=ID_metho]

refs_combo <- refs_combo[order(-n_philo, -n_metho,dist)]
max_id <- max(refs_combo$ID_phi,na.rm = TRUE)
refs_combo[is.na(ID_phi), `:=`(ID_metho = (max_id + (1:.N)), n_philo = 0)]

refs_combo <- refs_combo[,head(.SD,1),by=ID_phi]

# Creating rank variable (giving the smallest rank to ties)
refs_combo[, `:=`(rank_phi = rank(-n_philo,ties.method="min"),
                  rank_metho = rank(-n_metho,ties.method="min"))]

# The articles in top 50 of two corpora:
refs_combo <- refs_combo[order(rank_phi+ rank_metho)] # ordering by weighing each rank as much
n_top = 50 
refs_combo[ rank_phi <= n_top & rank_metho <= n_top, `In both` := TRUE]
refs_combo[ is.na(`In both`), `In both` := FALSE]

# in_two_tops <- refs_combo[ rank_phi <= n_top & rank_metho <= n_top, list(`Rank Phi` = rank_phi, 
#                                                           `Rank Meth` = rank_metho,
#                                                           `First Author` = First_surname %>%  tolower %>% toTitleCase(),
#                                                           `Year` = cited_year,
#                                                           `Abbreviated Publication` = publication_metho %>%  tolower %>% toTitleCase()) ] 
in_tops <- refs_combo[ rank_phi <= n_top | rank_metho <= n_top,list(`In both`, `Rank Phi` = rank_phi, 
                                                          `Rank Meth` = rank_metho,
                                                          `First Author` = First_surname %>%  tolower %>% toTitleCase(),
                                                          `Year` = cited_year,
                                                          `Abbreviated Publication` = publication_metho %>%  tolower %>% toTitleCase()) ]
in_tops %>%
  datatable(options = list(ordering = TRUE),
      caption = "Documents among the 50 most cited in at least one of the two corpora. Those with TRUE in the first column are in both top 50.")

# code to identify what to manually correct above
# refs_combo <- refs_combo[order(-n_philo, -n_metho,dist)]
# refs_combo[rank_metho > (rank_phi +1000),list(n_metho,rank_phi,
#                                             rank_metho,First_surname,cited_year,
#                                             publication_metho,refs_philo)]
# refs_combo[rank_phi <= 50,list(n_philo, n_metho,rank_phi,
#                                  rank_metho,First_surname,cited_year,publication_metho,refs_philo)]
# refs_combo <- refs_combo[order(-n_metho,-n_philo,dist)]
# refs_combo[rank_metho <= 50,list(n_philo, n_metho,rank_phi,
#                                  rank_metho,First_surname,cited_year,publication_metho,refs_philo)]

3 Results for Philosophy of Economics

3.1 Clustering

We test different community detection algorithms: Louvain, fast greedy, walktrap and infomap. We select the algorithm that produces the partition with the highest modularity score, which is Louvain in our case (as is typicaly the case).

#dt_refs contains all articles references. 
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")
dt_refs <- dt_refs[SO %in% c("ECONOMICS AND PHILOSOPHY", "JOURNAL OF ECONOMIC METHODOLOGY")]
#Giving an ID to every references
dt_references <- unique(dt_refs[,list(refs)])
dt_references$refID <- c(1:nrow(dt_references))
#Merging the refID back on a single table
dt_edges <- merge(dt_refs[, list(ID, refs, PY)], dt_references, by = "refs")[,list(ID, refID, PY)]
#Getting rid of references that appear only once
dt_edges <- unique(dt_edges[,N := .N, by = "refID"][N > 1][, list(ID, refID, PY)])
dt_edges <- dt_edges[,N := .N, by = "refID"][N > 1][, list(ID, refID, PY)]

bib_coup <- bibliographic_coupling(dt_edges, "ID", "refID")

#Getting louvain communities from the bibliographic coupling table
graph <- graph_from_data_frame(bib_coup, directed=FALSE, vertices=NULL)
louvain_result <- cluster_louvain(graph, weights = bib_coup$N)
fast_greedy <- cluster_fast_greedy(graph, weights = bib_coup$N)
walktrap <- cluster_walktrap(graph, weights = bib_coup$N)
infomap <- cluster_infomap(graph, e.weights = bib_coup$N)

#What algorithm has the best modularity?
modularity_value <- c(modularity(louvain_result),
                        modularity(fast_greedy),
                        modularity(walktrap),
                        modularity(infomap))

#Naming the modularity results
clustering_algorithm <- c("Louvain", "Fast Greedy", "Walktrap", "Infomap")
modularity_test_results <- data.table(clustering_algorithm, modularity_value)

kable(modularity_test_results, caption = "Modularity value for different clustering algorithms on our corpus")

Modularity value for different clustering algorithms on our corpus
clustering_algorithm	modularity_value
Louvain	0.3772045
Fast Greedy	0.3341597
Walktrap	0.3381432
Infomap	0.2532636

#Formatting the results
article_and_com_id <- data.table()
for (i in (1:length(louvain_result)))
{
  com <- data.table(ID = louvain_result[[i]])
  com$com_ID <- i
  article_and_com_id <- rbind(article_and_com_id,com, fill = TRUE)
  
}

#Only keeping communities that have more than 10 articles
article_and_com_id <- article_and_com_id[,N:=.N, by="com_ID"][N > 10]
article_and_com_id$ID <- as.numeric(article_and_com_id$ID)
save(article_and_com_id, file = "/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")

com_w_nb <- article_and_com_id[,.N,com_ID]
rm(article_and_com_id); rm(dt_edges); rm(dt_references)

After excluding clusters with 10 or less articles, we are left with 5 clusters that have the following article distribution (labels are given manually based on what are in the clusters, see below):

doc_topic_map <- data.table(document = 1:5, Topic = c("Moral\nPhilosophy","Big M","Decision\nTheory","Small m","Behavioral\nEconomics"))

com_w_nb <- merge(com_w_nb,doc_topic_map ,by.x = "com_ID", by.y= "document")

com_w_nb[order(-N),list(Cluster = str_replace(Topic, "\n"," "), `Number of articles`=N)]

3.2 TF-IDF with unigrams and bigrams

Our first representation uses the most characteristics tokens in the title of the articles of each cluster over the full period. We manually named the clusters based on these words and on perusing the documents they cite most often (see below).

load("/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
Communities <- merge(dt_CorpusArticles[,list(ID, TI,PY,AU)], article_and_com_id[,list(ID, com_ID)], by="ID")
setnames(Communities, c("TI", "com_ID"), c("titre", "modularity_class"))
 
 
corpus <- cleaning_corpus(Communities, "titre")

imp_bigrams <- find_most_significant_bigram(corpus)

#Generating the unigram table (all unigram for each document that aren't part of the significant bigram we found)
path_dic_philo <- "/projects/digital_history/philo_and_economics/data/dictionary_philo.RData"

dt_ngram_occurences <- make_unigram_and_bigram_occurence_table(corpus = corpus, imp_bigrams = imp_bigrams, 
                                                    unstem_dictionary_path = path_dic_philo)

dt_ngram_occurences[,conversion := conversion_table[n_gram]][!is.na(conversion), n_gram := conversion]
dt_ngram_occurences[,conversion := NULL]
#Naming topics

dt_ngram_occurences <- merge(dt_ngram_occurences,doc_topic_map ,by = "document", all.x = TRUE)

plot_tfidf(dt_ngram_occurences, colors = colors_cluster, order_disc = order_disc_philo, title_graph = "")

From our position as participants in the field, the sets of keywords are quite telling and our labelling of each cluster was straightforward. Only a few keywords might be surprising for an insider. We add a subsection investigating why these keywords are present

3.2.1 Further investigation of surprising keywords

Our first surprise is to find 'reversals' as the second most distinctive token for the cluster that we label Big M based on every other information. We look at titles with this token:

# Checking "reversal" in Big M
token <- "reversa"
com <-  "Big M"
revers_papers <- 
  Communities[
    modularity_class ==doc_topic_map [grepl(com, Topic), document] &
    grepl(pattern = token, x = toTitleCase(titre),ignore.case = T),
    list(
    Author = AU, Year = PY, Title = toTitleCase(tolower(titre))
  )
][order(Year)] 

kable(revers_papers)

Author	Year	Title
TAMMI T	1999	Incentives and Preference Reversals: Escape Moves and Community Decisions in Experimental Economics
GUALA F	2000	Artefacts in Experimental Economics: Preference Reversals and the Beckerdegrootmarschak Mechanism
ANGNER E	2002	Levi's Account of Preference Reversals

We see that we have three titles with this keyword in the Big M cluster, but there is 0 title in the other clusters with this keyword. We accept that these papers might not be best described as Big M, but clustering algorithms necessarily make some debatable allocations by creating mutually exclusive sets among shades of grey. Two of these authors are well-known in the field (Angner and Guala), so we can verify that their papers tend to be put in the clusters that is closer to their main topic of research: Behavioral Economics (and, by extension, experimental economics):

# Our three relevant authors
aut_rev <- c("TAMMI T", "GUALA F", "ANGNER E")

aut_rev <-   merge( Communities[
 # modularity_class ==doc_topic_map[grepl(com, Topic), document] &
    grepl(pattern = paste0(aut_rev,collapse = "|"), x = AU,ignore.case = T),],
doc_topic_map[,list(modularity_class =document, Cluster = str_replace(Topic, "\n", " "))], 
 by = "modularity_class")
  
# correcting AU for cases where Guala coauthors with others:
one_auth <- "GUALA F"
aut_rev[grepl(one_auth,x = AU), AU := one_auth]
setnames(aut_rev,"AU", "Scholar")

aut_rev[,list(`Number of papers` = .N),by = .(Scholar,Cluster)][order(Scholar,-`Number of papers`)] #%>% kable()

We find indeed that the general allocation fits with our prior understanding of these scholars' profile.

Now, our cluster Decision Theory includes two potentially surprising surnames: Keynes and Soros. We look at the keyword "Keynes's" first:

# Checking "Soros" in Decision Theory
token <- "Keynes's"
com <-  "Decision"
the_papers <- 
  Communities[
    modularity_class ==doc_topic_map[grepl(com, Topic), document] &
    grepl(pattern = token, x = toTitleCase(titre),ignore.case = T),
    list(
    Author = AU, Year = PY, Title = toTitleCase(tolower(titre))
  )
][order(Year)] 

kable(the_papers)

Author	Year	Title
COTTRELL A	1993	Keynes's Theory of Probability and Its Relevance to His Economics: Three Theses
CHICK V	2003	Theory, Method and Mode of Thought in Keynes's General Theory
DOW SC;GHOSH D	2009	Fuzzy Logic and Keynes's Speculative Demand for Money

So there are three title included in Decision Theory that have this keyword and 0 title in the other clusters has it. We can intuitively understand why these articles are in Decision Theory: at least two of them have a formal dimension (either Keynes's theory of probability or fuzzy logic). Finally, "Soros"" is included because of a 2013 symposium on his work on the importance of "reflexivity" in economic decisions:

# Checking "Soros" in Decision Theory
token <- "Soros"
com <-  "Decision"
the_papers <- 
  Communities[
    modularity_class ==doc_topic_map[grepl(com, Topic), document] &
    grepl(pattern = token, x = toTitleCase(titre),ignore.case = T),
    list(
    Author = AU, Year = PY, Title = toTitleCase(tolower(titre))
  )
][order(Year)] 

kable(the_papers)

Author	Year	Title
CALDWELL B	2013	George Soros: Hayekian?
NOTTURNO MA	2013	Soros and Popper: On Fallibility, Reflexivity, and the Unity of Method
HANDS DW	2013	Introduction to Symposium on 'Reflexivity and Economics: George Soros's Theory of Reflexivity and the Methodology of Economic Science'
DAVIS JB	2013	Soros's Reflexivity Concept in a Complex World: Cauchy Distributions, Rational Expectations, and Rational Addiction
CROSS R;HUTCHINSON H;LAMBA H;STRACHAN D	2013	Reflections on Soros: Mach, Quine, Arthur and Far-from-Equilibrium Dynamics

All in all, there are extremely few anomalies in what we have observed by looking more closely at some titles. Our labelling of clusters seems to rest on solid ground.

3.2.2 Keywords by decades

We can repeat the same procedure with tf-idf, but for each decade in order to have an idea of the temporal evolution of topics.

load("/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
Communities <- merge(dt_CorpusArticles[,list(ID, TI)], article_and_com_id[,list(ID, com_ID)], by="ID")
setnames(Communities, c("TI", "com_ID"), c("titre", "modularity_class"))

#merging Publication Year and Author
Communities <- merge(Communities, dt_CorpusArticles[,list(ID,AU, PY)], by = "ID")


#Adding decade column
for (i in c(1990,2000,2010)){
  Communities[between(PY,i,i+9),decade:=i]
}

for(d in unique(Communities[!is.na(decade)]$decade)){
corpus <- cleaning_corpus(Communities[decade == d], "titre")

imp_bigrams <- find_most_significant_bigram(corpus)

#Generating the unigram table (all unigram for each document that aren't part of the significant bigram we found)
path_dic_philo <- "/projects/digital_history/philo_and_economics/data/dictionary_philo.RData"

dt_ngram_occurences <- make_unigram_and_bigram_occurence_table(corpus = corpus, imp_bigrams = imp_bigrams, 
                                                    unstem_dictionary_path = path_dic_philo)

dt_ngram_occurences[,conversion := conversion_table[n_gram]][!is.na(conversion), n_gram := conversion]
dt_ngram_occurences[,conversion := NULL]

#Naming topics
dt_ngram_occurences <- merge(dt_ngram_occurences,doc_topic_map,by = "document", all.x = TRUE)


print(plot_tfidf(dt_ngram_occurences,title_graph = paste0("TF-IDF for ",d, " to ", d+9)
                   ,colors = colors_cluster, order_disc = order_disc_philo))
}

3.3 Top citations per cluster

We produce a set of tables about the citations for each cluster. The first table is the one we use in the chapter: we produce an html version and the LaTeX version for the paper. We then present the same results in a long format (full citation information). The last table is not what the clusters cite most, but rather which articles in each cluster have received the most academic citations overall.

load("/projects/digital_history/philo_and_economics/data/Communities_partition_30years.RData")
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")
Communities[, Topic := str_replace(Topic, "\n", " ")]

Communities <- merge(Communities, dt_refs[,list(refs, unique_ref_id)] %>% unique(), by = "refs", all.x = TRUE, all.y = FALSE)


#Top citations per community
top_ref_per_topic <- Communities[,list(nb_refs = .N), by = c("Topic", "unique_ref_id")]
top_ref_per_topic <- merge(top_ref_per_topic, dt_refs[,head(.SD, 1), 
                                                      by = unique_ref_id][,list(unique_ref_id, Author, Year)],
                           by = "unique_ref_id")
setorder(top_ref_per_topic, Topic, -nb_refs)
#top_ref_per_topic[,refs := paste0(Author, "-", Year)]
top_ref <- unique(top_ref_per_topic[,list(Author, Year, Topic)])[,head(.SD, 5), by="Topic"]
top_ref[,refs := paste0(Author %>% tolower() %>% toTitleCase(), "~", Year)]

#Formating the table so it has the form : author-date/author-date/author-date/..
top_ref <- top_ref[,aggregate(refs, list(Topic), paste0, collapse = " ")]
setnames(top_ref, c("Group.1", "x"), c("Cluster", "Full period"))
top_ref_full_period <- copy(top_ref)
#kable(top_ref, caption = "Top 5 of most cited documents")



#Top citations per decades
top_ref_per_topic_decade <- Communities[,list(nb_refs = .N), by = c("Topic", "unique_ref_id", "decade")]
top_ref_per_topic_decade <- merge(top_ref_per_topic_decade, dt_refs[,head(.SD, 1), 
                                                      by = unique_ref_id][,list(unique_ref_id, Author, Year)],
                           by = "unique_ref_id")
setorder(top_ref_per_topic_decade, Topic, decade, -nb_refs)
#top_ref_per_topic_decade[,refs := paste0(Author, "-", Year)]
top_ref <- unique(top_ref_per_topic_decade[!is.na(decade),list(Author, Year, Topic, decade)])[,head(.SD, 5), by=.(Topic,decade)]
top_ref[,refs := paste0(Author %>% tolower() %>% toTitleCase(), "~", 
                        Year)]


#Formating the table so it has the form : author-date/author-date/author-date/..
top_ref <- top_ref[,aggregate(refs, list(Topic), paste, collapse = " "), by = "decade"]
setnames(top_ref, c("Group.1", "x"), c("Cluster", "reference"))
# Making a column per decade
top_ref <- spread(top_ref,decade, reference)
setnames(top_ref,c("1990","2000","2010"), c("1990-1999","2000-2009","2010-2019"))
# Getting the ordering as in the tf-idf and time series graph
top_ref <- merge(top_ref, data.table(Order = 1:length(order_disc_philo), 
                                     Cluster = str_replace(order_disc_philo, "\n"," ")),      
                 by = "Cluster" )
# Bringing in full period data:
top_ref <- merge(top_ref, top_ref_full_period, by = "Cluster")

# ordering rows:
setorder(top_ref,Order)
top_ref$Order <- NULL
# ordering columns:
setcolorder(top_ref, c("Cluster", "Full period"))
# Printing
kable(top_ref, caption = "Top 5 of most cited documents per decade (compact format)")

Top 5 of most cited documents per decade (compact format)
Cluster	Full period	1990-1999	2000-2009	2010-2019
Moral Philosophy	Rawls~1971 Nozick~1974 Parfit~1984 Sen~1970 Broome~1991	Rawls~1971 Parfit~1984 Nozick~1974 Harsanyi~1955 Broome~1991	Rawls~1971 Broome~1991 Nozick~1974 Scanlon~1998 Arrow~1951	Rawls~1971 Sen~1999 Sen~1970 Broome~2004 Harsanyi~1955
Behavioral Economics	Kahneman~1979 Savage~1954 Camerer~2005 Gul~2008 Ross~2005	Kahneman~1979 Friedman~1953 Keynes~1971 Allais~1953 Becker~1993	Kahneman~1979 Savage~1954 Ellsberg~1961 Smith~1982 Allais~1953	Savage~1954 Kahneman~1979 Camerer~2005 Gul~2008 Kahneman~2011
Big M	Hausman~1992 Friedman~1953 Mccloskey~1985 Blaug~1962 Robbins~1932	Hausman~1992 Mccloskey~1985 Blaug~1962 Friedman~1953 Rosenberg~1993	Hausman~1992 Friedman~1953 Hands~2001 Hutchison~1938 Blaug~1962	Hausman~1992 Friedman~1953 Reiss~2012 Robbins~1932 Hands~2001
Small m	Haavelmo~1944 Hoover~2001 Mccloskey~1985 Pearl~2001 Spirtes~2000	Mccloskey~1985 Mirowski~1989 Cooley~1985 Engle~1987 Gilbert~1986	Haavelmo~1944 Hoover~2000 Hendry~1995 Hoover~2001 Kuhn~1962	Deaton~2010 Haavelmo~1944 Pearl~2001 Spirtes~2000 Hoover~2001
Decision Theory	Keynes~1936 Luce~1957 Pearce~1984 Aumann~1976 Lewis~1969	Keynes~1936 Binmore~1987 Selten~1975 Aumann~1976 Bernheim~1984	Hollis~1998 Keynes~1921 Keynes~1936 Lewis~1969 Bernheim~1984	Soros~2013 Bacharach~2006 Keynes~1936 Mackenzie~2008 Schelling~1960

top_ref_compact <- top_ref

top_ref_compact %>% xtable(align= c("r|","L{0.14\\textwidth}", rep("L{0.17\\textwidth}",4)),
                               caption = "Most cited documents per cluster in the corpus of specialized philosophy of economics",
                               label = "tab:most_ref_phi") %>% 
  print(include.rownames=FALSE, sanitize.text.function = identity,
        hline.after=-1:nrow(top_ref_compact), size = "small"
        )

## % latex table generated in R 3.6.3 by xtable 1.8-4 package
## % Fri Oct  2 02:32:40 2020
## \begin{table}[ht]
## \centering
## \begingroup\small
## \begin{tabular}{L{0.14\textwidth}L{0.17\textwidth}L{0.17\textwidth}L{0.17\textwidth}L{0.17\textwidth}}
##   \hline
## Cluster & Full period & 1990-1999 & 2000-2009 & 2010-2019 \\ 
##   \hline
## Moral Philosophy & Rawls~1971 Nozick~1974 Parfit~1984 Sen~1970 Broome~1991 & Rawls~1971 Parfit~1984 Nozick~1974 Harsanyi~1955 Broome~1991 & Rawls~1971 Broome~1991 Nozick~1974 Scanlon~1998 Arrow~1951 & Rawls~1971 Sen~1999 Sen~1970 Broome~2004 Harsanyi~1955 \\ 
##    \hline
## Behavioral Economics & Kahneman~1979 Savage~1954 Camerer~2005 Gul~2008 Ross~2005 & Kahneman~1979 Friedman~1953 Keynes~1971 Allais~1953 Becker~1993 & Kahneman~1979 Savage~1954 Ellsberg~1961 Smith~1982 Allais~1953 & Savage~1954 Kahneman~1979 Camerer~2005 Gul~2008 Kahneman~2011 \\ 
##    \hline
## Big M & Hausman~1992 Friedman~1953 Mccloskey~1985 Blaug~1962 Robbins~1932 & Hausman~1992 Mccloskey~1985 Blaug~1962 Friedman~1953 Rosenberg~1993 & Hausman~1992 Friedman~1953 Hands~2001 Hutchison~1938 Blaug~1962 & Hausman~1992 Friedman~1953 Reiss~2012 Robbins~1932 Hands~2001 \\ 
##    \hline
## Small m & Haavelmo~1944 Hoover~2001 Mccloskey~1985 Pearl~2001 Spirtes~2000 & Mccloskey~1985 Mirowski~1989 Cooley~1985 Engle~1987 Gilbert~1986 & Haavelmo~1944 Hoover~2000 Hendry~1995 Hoover~2001 Kuhn~1962 & Deaton~2010 Haavelmo~1944 Pearl~2001 Spirtes~2000 Hoover~2001 \\ 
##    \hline
## Decision Theory & Keynes~1936 Luce~1957 Pearce~1984 Aumann~1976 Lewis~1969 & Keynes~1936 Binmore~1987 Selten~1975 Aumann~1976 Bernheim~1984 & Hollis~1998 Keynes~1921 Keynes~1936 Lewis~1969 Bernheim~1984 & Soros~2013 Bacharach~2006 Keynes~1936 Mackenzie~2008 Schelling~1960 \\ 
##    \hline
## \end{tabular}
## \endgroup
## \caption{Most cited documents per cluster in the corpus of specialized philosophy of economics} 
## \label{tab:most_ref_phi}
## \end{table}

# Same thing, but in "long format"
top_ref_per_topic_decade <- Communities[,list(
                  nb_refs = .N
                  ), by = c("Topic", "unique_ref_id", "decade")]
setorder(top_ref_per_topic_decade, Topic, decade, -nb_refs)

top_ref_per_topic_decade <- merge(
  top_ref_per_topic_decade[,head(.SD,5), by = c("Topic", "decade")], 
  dt_refs[,list(reference = first(refs)), by = unique_ref_id],
                           by = "unique_ref_id")
setorder(top_ref_per_topic_decade, Topic, decade,- nb_refs)

top_ref_per_topic_decade %>% datatable(caption = "Top 5 of most cited documents per decade (long format)")

#Top cited article in community
setorder(Communities, Topic, -TC)
datatable(unique(Communities[,list(AU, titre, TC, Cluster = Topic)])[,head(.SD,5), by= "Cluster"], caption = "Articles in each community that are the most cited in general")

3.3.1 Looking more closely at some references

One surprising result is the centrality of Keynes's General Theory to the cluster we labelled Decision Theory. Here are the 21 references to this book in the Decision Theory cluster:

load("/projects/digital_history/philo_and_economics/data/Communities_partition_30years.RData")
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")
Communities[, Topic := str_replace(Topic, "\n", " ")]


Communities <- merge(Communities, dt_refs[,list(refs, unique_ref_id, Author,Year)] %>% unique(), by = "refs", all.x = TRUE, all.y = FALSE)


Communities[grepl("Decision", Topic) & Year == 1936 & Author == "KEYNES", 
            list(`First Author` = unlist(str_split(AU, ";"))[1],
                 Title = titre %>% tolower %>% toTitleCase(), Year = PY),by = ID][order(Year), 
                                                    list(`First Author`,Year,Title)] %>% 
  kable(row.names = TRUE)

	First Author	Year	Title
1	COTTRELL A	1993	Keynes's Theory of Probability and Its Relevance to His Economics: Three Theses
2	COTTRELL A	1995	Intentionality and Economics
3	MORRIS S	1995	The Common Prior Assumption in Economic Theory
4	MAYER T	1997	The Rhetoric of Friedman's Quantity Theory Manifesto
5	SNOWDON B	1998	Transforming Macroeconomics: An Interview with Robert E. Lucas Jr.
6	VERCELLI A	1999	The Evolution of is-Lm Models: Empirical Evidence and Theoretical Presuppositions
7	FIORETTI G	2001	von Kries and the Other German Logicians': Non-Numerical Probabilities Before Keynes
8	CHICK V	2003	Theory, Method and Mode of Thought in Keynes's General Theory
9	CHICK V	2005	The Meaning of Open Systems
10	BACKHOUSE RE	2009	An Unfinished Manuscript by Terence Hutchison
11	WITT U	2009	Novelty and the Bounds of Unknowledge in Economics
12	WILSON MC	2009	Creativity, Probability and Uncertainty
13	DOW SC	2009	Fuzzy Logic and Keynes's Speculative Demand for Money
14	FRYDMAN R	2013	Fallibility in Formal Macroeconomics and Finance Theory
15	DAVIS JB	2013	Soros's Reflexivity Concept in a Complex World: Cauchy Distributions, Rational Expectations, and Rational Addiction
16	ROSENBERG A	2013	Reflexivity, Uncertainty and the Unity of Science
17	BRONK R	2013	Reflexivity Unpacked: Performativity, Uncertainty and Analytical Monocultures
18	GASPARD M	2014	Logic, Rationality and Knowledge in Ramsey's Thought: Reassessing 'Human Logic'
19	KOUMAKHOV R	2014	Conventionalism, Coordination, and Mental Models: From Poincar to Simon
20	BALLANDONNE M	2019	The Historical Roots (18801950) of Recent Contributions (20002017) to Ecological Economics: Insights from Reference Publication Year Spectroscopy
21	IVAROLA L	2019	Alternative Consequences and Asymmetry of Results: their Importance for Policy Decision Making

rm(dt_refs, Communities)

3.4 Topics through time

load("/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
load("/projects/digital_history/philo_and_economics/data/dt_refs.RData")
Communities <- merge(dt_CorpusArticles[,list(ID, TI)], article_and_com_id[,list(ID, com_ID)], by="ID")
setnames(Communities, c("TI", "com_ID"), c("titre", "modularity_class"))
  
#merging Publication Year, Author and Citation count (TC)
Communities <- merge(Communities, unique(dt_refs[,list(ID, PY, refs )]), by = "ID")
Communities <- merge(Communities, dt_CorpusArticles[,list(ID,AU,TC)], by = "ID")

#Adding decade column
for (i in c(1990,2000,2010))
{
  Communities[between(PY,i,i+9),decade:=i]
}

#naming communities
Communities <- merge(Communities,doc_topic_map[,list(modularity_class = document, Topic)],by = "modularity_class", all.x = TRUE)


#We save that table so we can use it to find to top citations per cluster
save(Communities, file = "/projects/digital_history/philo_and_economics/data/Communities_partition_30years.RData")

#Plotting topics through time
plot_topic_thru_time(Communities, 1990, colors = colors_cluster)

## `geom_smooth()` using formula 'y ~ x'

3.4.1 What happened to decision theory?

The cluster Decision Theory decreased markedly over the studied period. To test the hypothesis that work on the topic has simply shifted elsewhere, we look at the citation pattern of the three core texts in game theory highly cited by the cluster: Luce and Raiffa (1957), Aumann (1976) and Pearce (1984). We use web of science data.

# Loading data fetched from WoS
cit_to_gt_classics <-   fread("/projects/digital_history/philo_and_economics/data/2020-09-27_papers_citing_at_least_one_game_th_classic.tsv",
                              quote = "")

## Warning in fread("/projects/digital_history/philo_and_economics/data/
## 2020-09-27_papers_citing_at_least_one_game_th_classic.tsv", : Discarded single-
## line footer: <<Completion time: 2020-09-27T13:23:08.1770850-04:00>>

setnames(cit_to_gt_classics, "Annee_Bibliographique", "Year")

# journal code for Econ & Philo is 4721 and for JEM is 21784

# Coding refs from philo of econ and from the rest
name_phi_econ <- "E&P or JEM"; name_other <- "Other journals"
cit_to_gt_classics[Code_Revue %in% c(4721,21784), Source := name_phi_econ]
cit_to_gt_classics[is.na(Source), Source := name_other]

# year-source journal aggregation
last_year = 2018
agg_cit_to_gt <- cit_to_gt_classics[between(Year, first_y,last_year),list(nb_cit =.N),by=.(Source,Year)]
# filling up years with 0 citations
dt1 <- merge(data.table(Year = first_y:last_year), 
             agg_cit_to_gt[Source == name_phi_econ,], by= "Year", all = TRUE)
dt1[is.na(Source), `:=`(Source = name_phi_econ, nb_cit =0)]
dt2 <- merge(data.table(Year = first_y:last_year), 
             agg_cit_to_gt[Source == name_other,], by= "Year", all = TRUE)
dt2[is.na(Source), `:=`(Source = name_other, nb_cit =0)]
agg_cit_to_gt <- rbindlist(list(dt1,dt2))
rm(dt1, dt2)

ggplot( agg_cit_to_gt, aes(x = Year, y = nb_cit, color = Source)) + geom_smooth() + 
  labs(y = "Number of citations", title = "Citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

So we see that citations to these documents have not dropped significantly. Let's see in which journals or field the documentstend to be cited:

# Name of discipline in cit info:
cit_to_gt_classics <-  merge(cit_to_gt_classics, 
        discipline_info[,list(Code_Discipline,Discipline = str_replace(discipline," \n |\n"," "))], by= "Code_Discipline")

# breakdown by disciplines over the full period
top_citing_disc <-   cit_to_gt_classics[,list(nb_cit =.N),by=Discipline]
top_citing_disc[order(-nb_cit), list(Discipline, `Share of citations` = round(nb_cit / sum(nb_cit),3) )][1:10] %>% 
  kable(caption = "Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)")

Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)
Discipline	Share of citations
Economics	0.354
Management	0.122
Psychology	0.087
Computers & Operations Research	0.082
Philosophy and Science Studies	0.059
Political Science & Public Administration	0.037
Other Engineering and Technology	0.034
Mathematics	0.033
Law	0.031
International Relations	0.029

# Looking at decades:
cit_to_gt_classics[between(Year,1990,1999),Decade := 1990]
cit_to_gt_classics[between(Year,2000,2009),Decade := 2000]
cit_to_gt_classics[between(Year,2010,2019),Decade := 2010]

# Aggregating by decade 
top_citing_disc_decade <-   cit_to_gt_classics[!is.na(Decade),list(nb_cit =.N),by=.(Decade,Discipline)][order(Decade,-nb_cit)]
top_citing_disc_decade[, `Share of citations` := round(nb_cit / sum(nb_cit),3), by= Decade] 
top_citing_disc_decade[,.SD[1:5,list(Discipline, `Share of citations`)], by = Decade] %>% 
  kable(caption = "Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)")

Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)
Decade	Discipline	Share of citations
1990	Economics	0.430
1990	Management	0.112
1990	Computers & Operations Research	0.091
1990	Psychology	0.067
1990	Philosophy and Science Studies	0.042
2000	Economics	0.361
2000	Computers & Operations Research	0.127
2000	Management	0.100
2000	Psychology	0.078
2000	Philosophy and Science Studies	0.069
2010	Economics	0.413
2010	Management	0.106
2010	Philosophy and Science Studies	0.102
2010	Computers & Operations Research	0.081
2010	Psychology	0.079

# And then also looking at journals most citing the papers

# Over the full period
top_citing_j <-   cit_to_gt_classics[,list(nb_cit =.N),by=Revue]
top_citing_j[order(-nb_cit), list(Journal = Revue, `Share of citations` = round(nb_cit / sum(nb_cit),3) )][1:10] %>% 
  kable(caption = "Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)")

Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)
Journal	Share of citations
JOURNAL OF ECONOMIC THEORY	0.037
GAMES AND ECONOMIC BEHAVIOR	0.033
THEORY AND DECISION	0.032
ECONOMETRICA	0.024
INTERNATIONAL JOURNAL OF GAME THEORY	0.015
SYNTHESE	0.015
JOURNAL OF CONFLICT RESOLUTION	0.013
MATHEMATICAL SOCIAL SCIENCES	0.013
MANAGEMENT SCIENCE	0.012
ECONOMICS AND PHILOSOPHY	0.010

# by decade 
top_citing_j_decade <-   cit_to_gt_classics[!is.na(Decade),list(nb_cit =.N),by=.(Decade,Revue)][order(Decade,-nb_cit)]
top_citing_j_decade[, `Share of citations` := round(nb_cit / sum(nb_cit),3), by= Decade] 
top_citing_j_decade[,.SD[1:5,list(Journal = Revue, `Share of citations`)], by = Decade] %>% 
  kable(caption = "Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)")

Share of citations to Luce and Raiffa (1957), Aumann (1976) or Pearce (1984)
Decade	Journal	Share of citations
1990	JOURNAL OF ECONOMIC THEORY	0.062
1990	GAMES AND ECONOMIC BEHAVIOR	0.058
1990	THEORY AND DECISION	0.041
1990	INTERNATIONAL JOURNAL OF GAME THEORY	0.033
1990	ECONOMETRICA	0.028
2000	GAMES AND ECONOMIC BEHAVIOR	0.050
2000	JOURNAL OF ECONOMIC THEORY	0.038
2000	THEORY AND DECISION	0.025
2000	ECONOMIC THEORY	0.021
2000	SYNTHESE	0.020
2010	GAMES AND ECONOMIC BEHAVIOR	0.056
2010	JOURNAL OF ECONOMIC THEORY	0.034
2010	SYNTHESE	0.029
2010	THEORY AND DECISION	0.026
2010	INTERNATIONAL JOURNAL OF GAME THEORY	0.020

3.5 Other information

3.5.1 Most prolific authors per cluster

load("/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
Communities <- merge(dt_CorpusArticles[,list(ID, TI, AU, PY,TC)], article_and_com_id[,list(ID, com_ID)], by="ID")
setnames(Communities, c("TI", "com_ID"), c("titre", "modularity_class"))

aut_per_cluster <- Communities[,list(n =.N),by = .(modularity_class,AU)]


#naming communities
aut_per_cluster <- merge(aut_per_cluster,doc_topic_map[,list(modularity_class = document, Topic = str_replace(Topic, "\n", " "))],by = "modularity_class", all.x = TRUE)

setorder(aut_per_cluster,modularity_class, -n)

kable(aut_per_cluster[,list(Topic,Author = AU,n)][,head(.SD,5),by=Topic], caption = "Author in each cluster who published the most articles")

Author in each cluster who published the most articles
Topic	Author	n
Moral Philosophy	QIZILBASH M	7
Moral Philosophy	SEN A	5
Moral Philosophy	BROWN C	4
Moral Philosophy	BROOME J	4
Moral Philosophy	STERN N	4
Big M	HAUSMAN DM	10
Big M	MKI U	9
Big M	HART J	6
Big M	REISS J	5
Big M	BACKHOUSE RE	4
Decision Theory	SUGDEN R	3
Decision Theory	WITZTUM A	3
Decision Theory	DE VROEY M	3
Decision Theory	BROWN V	3
Decision Theory	HDOIN C	2
Small m	COOK S	4
Small m	REISS J	3
Small m	SPANOS A	3
Small m	KHOSROWI D	2
Small m	LEROY SF	2
Behavioral Economics	HERRMANN-PILLATH C	5
Behavioral Economics	ROSS D	5
Behavioral Economics	GUALA F	3
Behavioral Economics	JONES MK	3
Behavioral Economics	HARRISON GW	3

3.5.2 Split of clusters across the two journals

load("/projects/digital_history/philo_and_economics/data/article_and_com_id.RData")
load("/projects/digital_history/philo_and_economics/data/dt_ScopusCorpus_PhiEcon.RData")
Communities <- merge(dt_CorpusArticles[,list(ID, TI, AU, SO, PY,TC)], article_and_com_id[,list(ID, com_ID)], by="ID")
setnames(Communities, c("TI", "com_ID", "SO"), c("titre", "modularity_class", "Journal"))

journal_share_per_cluster <- Communities[,list(n =.N),by = .(modularity_class,Journal)]
journal_share_per_cluster[,share_cluster := n/sum(n),by=modularity_class]

#naming communities
journal_share_per_cluster <- merge(journal_share_per_cluster,doc_topic_map[,list(modularity_class = document, Topic = str_replace(Topic, "\n", " "))],by = "modularity_class", all.x = TRUE)

setorder(journal_share_per_cluster,modularity_class, Journal)

journal_share_per_cluster[,list(Topic,Journal, share_cluster)] %>%  
  kable()

Topic	Journal	share_cluster
Moral Philosophy	ECONOMICS AND PHILOSOPHY	0.8592058
Moral Philosophy	JOURNAL OF ECONOMIC METHODOLOGY	0.1407942
Big M	ECONOMICS AND PHILOSOPHY	0.1581395
Big M	JOURNAL OF ECONOMIC METHODOLOGY	0.8418605
Decision Theory	ECONOMICS AND PHILOSOPHY	0.4715026
Decision Theory	JOURNAL OF ECONOMIC METHODOLOGY	0.5284974
Small m	ECONOMICS AND PHILOSOPHY	0.1318681
Small m	JOURNAL OF ECONOMIC METHODOLOGY	0.8681319
Behavioral Economics	ECONOMICS AND PHILOSOPHY	0.3468208
Behavioral Economics	JOURNAL OF ECONOMIC METHODOLOGY	0.6531792

4 Results for Methodology of Economics

4.1 Clustering

We use Louvain algorithm once again since it had the best results in the other corpus

load("/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")

# Constraining to documents in corpus and in temporal bounds
dt_citing_cited <- dt_citing_cited[ID_Art %in% dt_Articles[between(Year,first_y,last_y_metho) #& Code_Discipline == 119
                                                           ,ID_Art]]

# Removing documents only cited once (because they can't participate to coupling)
i <- dt_citing_cited[,.N,by = New_id2][N>1,New_id2]
dt_ref <- dt_citing_cited[New_id2 %in% i]
bib_coup <- bibliographic_coupling(dt_ref, "ID_Art", "New_id2")

graph <- graph_from_data_frame(bib_coup, directed=FALSE, vertices=NULL)
louvain_result <- cluster_louvain(graph, weights = bib_coup$N)

#Formatting the results
article_and_com_id <- data.table()
for (i in (1:length(louvain_result)))
{
  com <- data.table(ID = louvain_result[[i]])
  com$com_ID <- i
  article_and_com_id <- rbind(article_and_com_id,com, fill = TRUE)
  
}

#Only keeping communities that has more than 10 articles
article_and_com_id <- article_and_com_id[,N:=.N, by="com_ID"][N > 10]
article_and_com_id$ID <- as.numeric(article_and_com_id$ID)
save(article_and_com_id, file = "/projects/digital_history/philo_and_economics/data/article_and_com_id_metho.RData")
com_w_nb <- article_and_com_id[,.N,com_ID]
rm(article_and_com_id); rm(louvain_result); rm(graph)

com_w_nb <- merge(com_w_nb,JEL_doc_topic_map ,by.x = "com_ID", by.y= "document")

com_w_nb[order(-N),list(Cluster = str_replace(Topic, "\n"," "), `Number of articles`=N)]

4.2 TF-IDF with unigrams and bigrams

We produce again the keywords over the full period. We manually named the clusters based on these words and on perusing the documents they cite most often (see below).

load("/projects/digital_history/philo_and_economics/data/article_and_com_id_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
Communities <- merge(dt_Articles[,list(ID_Art, Titre, Year)], article_and_com_id[,list(ID, com_ID)], by.x="ID_Art", by.y = "ID")
setnames(Communities, c("ID_Art", "Titre", "com_ID", "Year"), c("ID", "titre", "modularity_class", "PY"))
 
setkey(Communities, modularity_class)
 
#Making a dictionary in order to unstem our tf-idf later on
path_dic_metho <- "/projects/digital_history/philo_and_economics/data/dictionary_metho.RData"
make_unstem_dictionary(Communities, path_to_save_to = path_dic_metho, col_name = "titre")

corpus <- cleaning_corpus(table = Communities, col_name = "titre")

imp_bigrams <- find_most_significant_bigram(corpus)

#Generating the unigram table (all unigram for each document that aren't part of the significant bigram we found)
dt_ngram_occurences <- make_unigram_and_bigram_occurence_table(corpus = corpus, imp_bigrams = imp_bigrams,
                                                    unstem_dictionary_path = path_dic_metho)
dt_ngram_occurences[,conversion := conversion_table[n_gram]][!is.na(conversion), n_gram := conversion]
dt_ngram_occurences[,conversion := NULL]
#Naming topics

dt_ngram_occurences <- merge(dt_ngram_occurences,JEL_doc_topic_map ,by = "document", all.x = TRUE)

#ploting the tf-idf
plot_tfidf(dt_ngram_occurences, colors = colors_cluster, order_disc = order_disc_metho, title_graph = "")

 rm(dt_ngram_occurences)

4.2.1 Further investigation of surprising keywords

We could check various keywords here, but the patterns are rather clear without digging more. See especially the main references and the main sources below.

Checking the use of "conning", "Shiller", "Sen's" and "Sraffa" in Small m:

load("/projects/digital_history/philo_and_economics/data/Communities_metho.RData")
 

# Checking "conning" in Empirical Methodolog
token <- "conning|shiller|sen's|sraffa"
com <-  "Small"
revers_papers <-
  Communities[
    modularity_class ==JEL_doc_topic_map[grepl(com, Topic), document] &
    grepl(pattern = token, x = toTitleCase(titre),ignore.case = T),
    list(
    #Author = AU, 
      Year = PY, Title = toTitleCase(tolower(titre))
  )
][order(Year)]

kable(revers_papers)

Year	Title
2006	a Comment on Sen's 'Sraffa, Wittgenstein, and Gramsci'
2007	Variations on the Theme of Conning in Mathematical Economics
2012	Piero Sraffa and 'The True Object of Economics': The Role of the Unpublished Manuscripts
2012	Piero Sraffa and the Future of Economics
2012	The Change in Sraffa's Philosophical Thinking
2013	Rational Expectations: Retrospect and Prospect a Panel Discussion with Michael Lovell, Robert Lucas, Dale Mortensen, Robert Shiller, and Neil Wallace
2017	Market Sociality: Mirowski, Shiller and the Tension Between Mimetic and Anti-Mimetic Market Features

4.2.2 Keywords by decades

Now the keywords for each decade.

load("/projects/digital_history/philo_and_economics/data/article_and_com_id_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
Communities <- merge(dt_Articles[,list(ID_Art, Titre, Year)], article_and_com_id[,list(ID, com_ID)], by.x="ID_Art", by.y = "ID")
setnames(Communities, c("ID_Art", "Titre", "com_ID", "Year"), c("ID", "titre", "modularity_class", "PY"))
 
setkey(Communities, modularity_class)


#Adding decade column
for (i in c(1990,2000,2010)){
  Communities[between(PY,i,i+9),decade:=i]
}

path_dic_metho <- "/projects/digital_history/philo_and_economics/data/dictionary_metho.RData"


for(d in unique(Communities[!is.na(decade)]$decade)){
com_decade <-   Communities[decade == d]
make_unstem_dictionary(com_decade, path_to_save_to = path_dic_metho, col_name = "titre")

corpus <- cleaning_corpus(Communities[decade == d], "titre")

imp_bigrams <- find_most_significant_bigram(corpus)

#Generating the unigram table (all unigram for each document that aren't part of the significant bigram we found)
dt_ngram_occurences <- make_unigram_and_bigram_occurence_table(corpus = corpus, imp_bigrams = imp_bigrams, 
                                                    unstem_dictionary_path = path_dic_metho)

dt_ngram_occurences[,conversion := conversion_table[n_gram]][!is.na(conversion), n_gram := conversion]
dt_ngram_occurences[,conversion := NULL]

#Naming topics
dt_ngram_occurences <- merge(dt_ngram_occurences,JEL_doc_topic_map ,by = "document", all.x = TRUE)


print(plot_tfidf(dt_ngram_occurences,title_graph = paste0("TF-IDF for ",d, " to ", ifelse(d==2010,d+8,d+9))
                   ,colors = colors_cluster, order_disc = order_disc_metho))
}

4.3 Top citations per cluster

We produce the set of tables about the citations for each cluster in a similar manner to what we did for the corpus on specialized philosophy of economics.

load("/projects/digital_history/philo_and_economics/data/article_and_com_id_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
Communities <- merge(dt_Articles[,list(ID = ID_Art, titre= Titre, PY = Year)], 
                     article_and_com_id[,list(ID, modularity_class = com_ID)], by="ID")
  
#Adding decade column
for (i in c(1990,2000,2010))
{
  Communities[between(PY,i,i+9),decade:=i]
}

#naming communities
Communities <- merge(Communities,JEL_doc_topic_map[,list(modularity_class = document, Topic)],by = "modularity_class", all.x = TRUE)
save(Communities, file = "/projects/digital_history/philo_and_economics/data/Communities_metho.RData")


# Loading reference data
load("/projects/digital_history/philo_and_economics/data/dt_citing_cited_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_refs_of_JEL_metho.RData")

Communities[, Topic := str_replace(Topic, "\n", " ")]

Communities <- merge(Communities[,list(ID, modularity_class, PY, decade, Topic)], dt_citing_cited %>%  unique(), by.x = "ID", by.y = "ID_Art", all.x = TRUE, all.y = FALSE)

Communities <- merge(Communities,dt_refs_of_JEL_metho, by = "New_id2", all.x = TRUE)


#Top citations per community
top_ref_per_topic <- Communities[,list(nb_refs = .N, Author = first(First_author), Year = first(cited_year)), by = c("Topic", "New_id2")]

setorder(top_ref_per_topic, Topic, -nb_refs)
#top_ref_per_topic[,refs := paste0(Author, "-", Year)]
top_ref <- unique(top_ref_per_topic[,list(Author, Year, Topic)])[,head(.SD, 5), by="Topic"]
top_ref[,c("last","init") := tstrsplit(Author,"-")]
top_ref[,refs := paste0(toTitleCase(tolower(last)),"-", substr(init,0,1), "~", Year)]

#Formating the table so it has the form : author-date/author-date/author-date/..
top_ref <- top_ref[,aggregate(refs, list(Topic), paste0, collapse = " ")]
setnames(top_ref, c("Group.1", "x"), c("Cluster", "Full period"))
top_ref_full_period <- copy(top_ref)
#kable(top_ref, caption = "Top 5 of most cited documents")



#Top citations per decades
top_ref_per_topic_decade <- Communities[,list(nb_refs = .N,
                                              Author = first(First_author), Year = first(cited_year)
                                              ), by = c("Topic", "New_id2", "decade")]

setorder(top_ref_per_topic_decade, Topic, decade, -nb_refs)
#top_ref_per_topic_decade[,refs := paste0(Author, "-", Year)]
top_ref <- unique(top_ref_per_topic_decade[!is.na(decade),list(Author, Year, Topic, decade)])[,head(.SD, 5), by=.(Topic,decade)]
top_ref[,c("last","init") := tstrsplit(Author,"-")]
top_ref[,refs := paste0(toTitleCase(tolower(last)),"-", substr(init,0,1), "~", Year)]



#Formating the table so it has the form : author-date/author-date/author-date/..
top_ref <- top_ref[,aggregate(refs, list(Topic), paste, collapse = " "), by = "decade"]
setnames(top_ref, c("Group.1", "x"), c("Cluster", "reference"))
# Making a column per decade
top_ref <- spread(top_ref,decade, reference)
setnames(top_ref,c("1990","2000","2010"), c("1990-1999","2000-2009","2010-2018"))
# Getting the ordering as in the tf-idf and time series graph
top_ref <- merge(top_ref, data.table(Order = 1:length(order_disc_metho), 
                                     Cluster = str_replace(order_disc_metho, "\n"," ")),      
                 by = "Cluster" )
# Bringing in full period data:
top_ref <- merge(top_ref, top_ref_full_period, by = "Cluster")

# ordering rows:
setorder(top_ref,Order)
top_ref$Order <- NULL
# ordering columns:
setcolorder(top_ref, c("Cluster", "Full period"))
# Printing
kable(top_ref, caption = "Top 5 of most cited documents per decade (compact format)")

Top 5 of most cited documents per decade (compact format)
Cluster	Full period	1990-1999	2000-2009	2010-2018
Institutional Economics	Nelson-R~1982 North-D~1990 Robbins-L~1935 Marshall-A~1920 Smith-A~1776	Nelson-R~1982 Williamson-O~1985 Veblen-T~1919 Marshall-A~1920 Williamson-O~1975	Nelson-R~1982 North-D~1990 Hayek-F~1948 Robbins-L~1935 Marshall-A~1920	Robbins-L~1935 North-D~1990 Smith-A~1776 Marshall-A~1920 Nelson-R~1982
Critical Realism	Lawson-T~1997 Lawson-T~2003 Bhaskar-R~1978 Bhaskar-R~1989 Fleetwood-S~1999	Lawson-T~1997 Bhaskar-R~1978 Bhaskar-R~1989 Lawson-T~1994 Bhaskar-R~1986	Lawson-T~1997 Lawson-T~2003 Bhaskar-R~1978 Bhaskar-R~1989 Fleetwood-S~1999	Lawson-T~1997 Lawson-T~2003 Lawson-T~2012 Lawson-T~2006 Bhaskar-R~1978
Political Economy	Searle-J~1995 Marx-K~1970 Marx-K~1973 Wendt-A~1999 George-A~2005	Marx-K~1970 Marx-K~1973 Ollman-B~1993 Hegel-G~1969 Cohen-G~1978	Searle-J~1995 Searle-J~1983 Searle-J~1969 Searle-J~1990 Tuomela-R~1995	Searle-J~1995 George-A~2005 Searle-J~2010 Wendt-A~1999 King-G~1994
Big M	Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Blaug-M~1992 Popper-K~1968	Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Blaug-M~1992 Popper-K~1968	Friedman-M~1953 Kuhn-T~1970 Popper-K~1968 Caldwell-B~1982 Mccloskey-D~1998	Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Popper-K~1968 Keynes-J~1936
Small m	Leamer-E~1983 Keynes-J~1936 Lucas-R~1981 Sraffa-P~1960 Leamer-E~1978	Stokey-N~1989 Davidson-P~1982 Arrow-K~1971 Keynes-J~1936 Leamer-E~1978	Keynes-J~1936 Leamer-E~1983 Sraffa-P~1960 Arrow-K~1971 Schwartz-J~1986	Leamer-E~1983 Keynes-J~1936 Lucas-R~1976 Sims-C~1980 Lucas-R~1981
History of Economics	Schumpeter-J~1954 Marshall-A~1920 Smith-A~1776 Schumpeter-J~1934 Schumpeter-J~1950	Schumpeter-J~1954 Hayek-F~1948 Marshall-A~1920 Smith-A~1776 Becker-G~1976	Schumpeter-J~1954 Blaug-M~1985 Blaug-M~1980 Mill-J~1848 Hayek-F~1967	Schumpeter-J~1954 Keynes-J~1936 Smith-A~1776 Schumpeter-J~1934 Marshall-A~1920

top_ref_compact <- top_ref

top_ref_compact %>% xtable(align= c("r|","L{0.14\\textwidth}", rep("L{0.20\\textwidth}",4)),
                               caption = "Most cited documents per cluster in the corpus of JEL code 'Economic Methodology'",
                               label = "tab:most_ref_metho") %>% 
  print(include.rownames=FALSE, sanitize.text.function = identity,
        hline.after=-1:nrow(top_ref_compact), size = "small"
        )

## % latex table generated in R 3.6.3 by xtable 1.8-4 package
## % Fri Oct  2 02:26:05 2020
## \begin{table}[ht]
## \centering
## \begingroup\small
## \begin{tabular}{L{0.14\textwidth}L{0.20\textwidth}L{0.20\textwidth}L{0.20\textwidth}L{0.20\textwidth}}
##   \hline
## Cluster & Full period & 1990-1999 & 2000-2009 & 2010-2018 \\ 
##   \hline
## Institutional Economics & Nelson-R~1982 North-D~1990 Robbins-L~1935 Marshall-A~1920 Smith-A~1776 & Nelson-R~1982 Williamson-O~1985 Veblen-T~1919 Marshall-A~1920 Williamson-O~1975 & Nelson-R~1982 North-D~1990 Hayek-F~1948 Robbins-L~1935 Marshall-A~1920 & Robbins-L~1935 North-D~1990 Smith-A~1776 Marshall-A~1920 Nelson-R~1982 \\ 
##    \hline
## Critical Realism & Lawson-T~1997 Lawson-T~2003 Bhaskar-R~1978 Bhaskar-R~1989 Fleetwood-S~1999 & Lawson-T~1997 Bhaskar-R~1978 Bhaskar-R~1989 Lawson-T~1994 Bhaskar-R~1986 & Lawson-T~1997 Lawson-T~2003 Bhaskar-R~1978 Bhaskar-R~1989 Fleetwood-S~1999 & Lawson-T~1997 Lawson-T~2003 Lawson-T~2012 Lawson-T~2006 Bhaskar-R~1978 \\ 
##    \hline
## Political Economy & Searle-J~1995 Marx-K~1970 Marx-K~1973 Wendt-A~1999 George-A~2005 & Marx-K~1970 Marx-K~1973 Ollman-B~1993 Hegel-G~1969 Cohen-G~1978 & Searle-J~1995 Searle-J~1983 Searle-J~1969 Searle-J~1990 Tuomela-R~1995 & Searle-J~1995 George-A~2005 Searle-J~2010 Wendt-A~1999 King-G~1994 \\ 
##    \hline
## Big M & Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Blaug-M~1992 Popper-K~1968 & Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Blaug-M~1992 Popper-K~1968 & Friedman-M~1953 Kuhn-T~1970 Popper-K~1968 Caldwell-B~1982 Mccloskey-D~1998 & Friedman-M~1953 Kuhn-T~1970 Mccloskey-D~1998 Popper-K~1968 Keynes-J~1936 \\ 
##    \hline
## Small m & Leamer-E~1983 Keynes-J~1936 Lucas-R~1981 Sraffa-P~1960 Leamer-E~1978 & Stokey-N~1989 Davidson-P~1982 Arrow-K~1971 Keynes-J~1936 Leamer-E~1978 & Keynes-J~1936 Leamer-E~1983 Sraffa-P~1960 Arrow-K~1971 Schwartz-J~1986 & Leamer-E~1983 Keynes-J~1936 Lucas-R~1976 Sims-C~1980 Lucas-R~1981 \\ 
##    \hline
## History of Economics & Schumpeter-J~1954 Marshall-A~1920 Smith-A~1776 Schumpeter-J~1934 Schumpeter-J~1950 & Schumpeter-J~1954 Hayek-F~1948 Marshall-A~1920 Smith-A~1776 Becker-G~1976 & Schumpeter-J~1954 Blaug-M~1985 Blaug-M~1980 Mill-J~1848 Hayek-F~1967 & Schumpeter-J~1954 Keynes-J~1936 Smith-A~1776 Schumpeter-J~1934 Marshall-A~1920 \\ 
##    \hline
## \end{tabular}
## \endgroup
## \caption{Most cited documents per cluster in the corpus of JEL code 'Economic Methodology'} 
## \label{tab:most_ref_metho}
## \end{table}

# Same thing, but in "long format"
top_ref_per_topic_decade <- Communities[,list(
                  nb_refs = .N
                  ), by = c("Topic", "New_id2", "decade")]
setorder(top_ref_per_topic_decade, Topic, decade, -nb_refs)

top_ref_per_topic_decade <- merge(
  top_ref_per_topic_decade[,head(.SD,5), by = c("Topic", "decade")], 
  dt_refs_of_JEL_metho[,list(reference = paste0(first(First_author)," (",first(cited_year),"), ", first(Publication))), by = New_id2],
                           by = "New_id2")
setorder(top_ref_per_topic_decade, Topic, decade,- nb_refs)

datatable(top_ref_per_topic_decade
  , caption = "Top 5 of most cited documents per decade (long format)")

4.4 Top sources per cluster

The idea now is to get the journals publishing the most articles of each cluster over the period. We then do by disciplines.

load("/projects/digital_history/philo_and_economics/data/Communities_metho.RData")
load("/projects/digital_history/philo_and_economics/data/dt_Articles_metho.RData")
 
# Name of disciplines with articles:
dt_Articles <-  merge(dt_Articles[between(Year, first_y, last_y_metho)], 
        discipline_info[,list(Code_Discipline,Discipline = str_replace(discipline," \n |\n"," "))], by= "Code_Discipline", all.x =T)
 

 Communities <- merge(Communities[,list(ID_Art = ID, decade, Topic)], 
       dt_Articles[,list(ID_Art, Year, Code_Discipline, Code_Revue,
                         Journal, Discipline)], by = "ID_Art")
 

 
 # counting over the whole period
 top_j <- Communities[, list(nb_art = .N ) , by = .(Topic, Journal) ][order(Topic,-nb_art)]
 top_j[, perc_j := nb_art/sum(nb_art), by = Topic]
 top_j <- top_j[, head(.SD,3), by = Topic]
 
  # counting per decade
 top_decade_j <- Communities[, list(nb_art = .N ) , by = .(Topic, decade, Journal) ][order(Topic,decade, -nb_art)]
 top_decade_j[, perc_j := nb_art/sum(nb_art), by = .(Topic,decade)]
 top_decade_j <- top_decade_j[, head(.SD,3), by = .(Topic,decade)]
 
 # The two together
 top_j[, decade := "Full period"]
 top_j <- rbindlist(list(top_j,top_decade_j), use.names = TRUE)
 rm(top_decade_j)

# Formatting
top_j[ ,Journal := Journal %>% tolower() %>% toTitleCase()] 
top_j[, Source := paste0(Journal, "~(", round(perc_j*100),"%)")] 

top_j <- top_j[,aggregate(Source, list(Topic), paste, collapse = "  "), by = "decade"]
setnames(top_j, c("Group.1", "x"), c("Cluster", "reference"))
top_j <- spread(top_j,decade, reference)
setnames(top_j,c("1990","2000","2010"), c("1990-1999","2000-2009","2010-2018"))
setcolorder(top_j,c("Cluster","Full period"))
top_j[, Cluster := str_replace(Cluster, "\n", " ")]
# manip to order rows as in tf-idf
top_j <- merge(top_j, data.table(Order = 1:length(order_disc_metho), 
                                     Cluster = str_replace(order_disc_metho, "\n"," ")),      
                 by = "Cluster" )
setorder(top_j,Order)
top_j$Order <- NULL


# Printing
kable(
  data.table(apply(top_j,MARGIN = 2, function(x) { str_replace_all(x, "~", " ") }))
  , caption = "Top 3 sources of articles per decade (compact format)")

Top 3 sources of articles per decade (compact format)
Cluster	Full period	1990-1999	2000-2009	2010-2018
Institutional Economics	Journal of Economic Issues (9%) Cambridge Journal of Economics (8%) Journal of Institutional and Theoretical Economics (7%)	Journal of Institutional and Theoretical Economics (19%) Journal of Economic Issues (12%) History of Political Economy (8%)	Cambridge Journal of Economics (16%) Journal of Economic Issues (11%) Journal of Economic Behavior and Organization (11%)	Journal of Institutional Economics (16%) Journal of Economic Behavior and Organization (9%) Cambridge Journal of Economics (7%)
Critical Realism	Cambridge Journal of Economics (42%) Journal of Post Keynesian Economics (8%) Review of Social Economy (7%)	Cambridge Journal of Economics (22%) Journal of Post Keynesian Economics (22%) Review of Social Economy (22%)	Cambridge Journal of Economics (51%) Journal of Post Keynesian Economics (10%) Journal of Economic Issues (5%)	Cambridge Journal of Economics (41%) Journal of Economic Issues (6%) Review of Radical Political Economics (4%)
Political Economy	Science and Society (22%) Review of International Political Economy (8%) American Journal of Economics & Sociology (8%)	Science and Society (66%) Review of International Political Economy (10%) Journal of Economic Issues (7%)	American Journal of Economics & Sociology (31%) Review of International Political Economy (16%) Science and Society (9%)	New Political Economy (13%) European Journal of International Relations (12%) Science and Society (10%)
Big M	History of Political Economy (12%) Journal of Economic Issues (8%) Cambridge Journal of Economics (5%)	History of Political Economy (20%) Journal of Economic Issues (10%) Revue Economique (6%)	Journal of Post Keynesian Economics (9%) Journal of Economic Issues (6%) Cambridge Journal of Economics (6%)	Journal of Economic Behavior and Organization (7%) Journal of Economic Issues (6%) Cambridge Journal of Economics (6%)
Small m	Journal of Economic Perspectives (11%) Cambridge Journal of Economics (11%) Journal of Post Keynesian Economics (8%)	Journal of Post Keynesian Economics (15%) Economic Journal (12%) History of Political Economy (12%)	Journal of Post Keynesian Economics (14%) Cambridge Journal of Economics (10%) World Development (10%)	Cambridge Journal of Economics (18%) Journal of Economic Perspectives (14%) Oxford Review of Economic Policy (7%)
History of Economics	History of Political Economy (20%) European Journal of the History of Economic Thought (14%) Cambridge Journal of Economics (12%)	History of Political Economy (38%) Journal of Economic Issues (12%) Scottish Journal of Political Economy (6%)	Politicka Ekonomie (16%) History of Political Economy (16%) European Journal of the History of Economic Thought (16%)	European Journal of the History of Economic Thought (22%) Cambridge Journal of Economics (20%) History of Economic Ideas (12%)

# Latex is commented out
# top_j %>% xtable(align= c("r|","L{0.14\\textwidth}", rep("L{0.17\\textwidth}",4)),
#                                caption = "Top 3 sources of articles per cluster in the corpus of JEL code 'Economic Methodology'",
#                                label = "tab:top_source_metho") %>% 
#   print(include.rownames=FALSE, sanitize.text.function = identity,
#         hline.after=-1:nrow(top_j), size = "small"
#         )

### Turning now to the aggregation by disciplines ###


 
 # counting over the whole period
 top_disc <- Communities[, list(nb_art = .N ) , by = .(Topic, Discipline) ][order(Topic,-nb_art)]
 top_disc[, perc_j := nb_art/sum(nb_art), by = Topic]
 top_disc <- top_disc[, head(.SD,3), by = Topic]
 
  # counting per decade
 top_decade_disc <- Communities[, list(nb_art = .N ) , by = .(Topic, decade, Discipline) ][order(Topic,decade, -nb_art)]
 top_decade_disc[, perc_j := nb_art/sum(nb_art), by = .(Topic,decade)]
 top_decade_disc <- top_decade_disc[, head(.SD,3), by = .(Topic,decade)]
 
 # The two together
 top_disc[, decade := "Full period"]
 top_disc <- rbindlist(list(top_disc,top_decade_disc), use.names = TRUE)
 rm(top_decade_disc)

# Formatting
top_disc[ ,Discipline := Discipline %>% tolower() %>% toTitleCase()] 
top_disc[, Source := paste0(Discipline, "~(", round(perc_j*100),"%)")] 

top_disc <- top_disc[,aggregate(Source, list(Topic), paste, collapse = "  "), by = "decade"]
setnames(top_disc, c("Group.1", "x"), c("Cluster", "reference"))
top_disc <- spread(top_disc,decade, reference)
setnames(top_disc,c("1990","2000","2010"), c("1990-1999","2000-2009","2010-2018"))
setcolorder(top_disc,c("Cluster","Full period"))
top_disc[, Cluster := str_replace(Cluster, "\n", " ")]
# manip to order rows as in tf-idf
top_disc <- merge(top_disc, data.table(Order = 1:length(order_disc_metho), 
                                     Cluster = str_replace(order_disc_metho, "\n"," ")),      
                 by = "Cluster" )
setorder(top_disc,Order)
top_disc$Order <- NULL


# Printing
kable(
  data.table(apply(top_disc,MARGIN = 2, function(x) { str_replace_all(x, "~", " ") })), 
  caption = "Top 3 diciplinary sources of articles per decade (compact format)")

Top 3 diciplinary sources of articles per decade (compact format)
Cluster	Full period	1990-1999	2000-2009	2010-2018
Institutional Economics	Economics (81%) Other Social Sciences (7%) Humanities (6%)	Economics (89%) Other Social Sciences (7%) Management (1%)	Economics (79%) Other Social Sciences (12%) Geography (3%)	Economics (74%) Humanities (11%) Other Social Sciences (4%)
Critical Realism	Economics (87%) Geography (4%) Political Science & Public Administration (2%)	Economics (97%) Geography (3%)	Economics (89%) Geography (6%) Management (2%)	Economics (82%) Political Science & Public Administration (5%) Humanities (4%)
Political Economy	Economics (40%) Other Social Sciences (32%) International Relations (13%)	Other Social Sciences (69%) Economics (28%) International Relations (3%)	Economics (59%) International Relations (16%) Other Social Sciences (16%)	Economics (36%) Other Social Sciences (23%) International Relations (16%)
Big M	Economics (76%) Other Social Sciences (6%) Management (5%)	Economics (85%) Management (6%) Other Social Sciences (4%)	Economics (62%) Other Social Sciences (11%) Management (8%)	Economics (68%) Humanities (8%) Other Social Sciences (7%)
Small m	Economics (95%) Other Social Sciences (2%) Political Science & Public Administration (1%)	Economics (97%) Political Science & Public Administration (3%)	Economics (100%)	Economics (93%) Other Social Sciences (4%) Humanities (2%)
History of Economics	Economics (82%) Humanities (12%) Other Social Sciences (4%)	Economics (94%) Other Social Sciences (6%)	Economics (84%) Other Social Sciences (8%) Humanities (8%)	Economics (74%) Humanities (22%) Management (2%)

# Latex is commented out
# top_disc %>% xtable(align= c("r|","L{0.14\\textwidth}", rep("L{0.17\\textwidth}",4)),
#                                caption = "Top 3 disciplinary sources of articles per cluster in the corpus of JEL code 'Economic Methodology'",
#                                label = "tab:top_source_metho") %>% 
#   print(include.rownames=FALSE, sanitize.text.function = identity,
#         hline.after=-1:nrow(top_ref_compact), size = "small"
#         )


print("Below: specific focus on the possible turn to mainstream of Small m. Note that we coded ourselves the journals as maintream or not.")

## [1] "Below: specific focus on the possible turn to mainstream of Small m. Note that we coded ourselves the journals as maintream or not."

# focus on small M
small_m <- Communities[Topic == "Small m", list(decade, Year, Code_Discipline,Code_Revue,Journal)]
 
# Writing a file to manually coding what is mainstream or not
fwrite(small_m[, list(.N, Journal = unique(Journal)), by= Code_Revue],
       "/projects/digital_history/philo_and_economics/data/2020-10-01_small_m_j.csv"
)

# Reloading the coded journals
mainstream_or_not <- fread("/projects/digital_history/philo_and_economics/data/2020-10-01_classifying_mainstream_for_small_m.csv")

# merging with initial data to have the mainstream/not for all journals
small_m <- merge(small_m, mainstream_or_not, by = "Journal", all.x= TRUE)
if(nrow(small_m[is.na(Coder1)])){
  stop("Some journals where a Small m article is published have not be classified as mainstream or not.")
}

small_m[Coder1 == 1 & Coder2== 1, Journal_status := "All Mainstream"]
small_m[Coder2 == 0 & Coder1== 0, Journal_status := "All not mainstream"]
small_m[is.na(Journal_status), Journal_status := "Unclear"]

# ggplot(small_m,aes(x=decade,fill=Journal_status)) + geom_bar()

kable(
  small_m[, list(`Proportion` = sum(Coder1)/.N), by= decade][order(decade)]
  , caption = "Proportion of articles in Small m that are published in mainstream economics journals according to Coder 1")

Proportion of articles in Small m that are published in mainstream economics journals according to Coder 1
decade	Proportion
1990	0.5454545
2000	0.4285714
2010	0.3859649

kable(
  small_m[, list(`Proportion` = sum(Coder2)/.N), by= decade][order(decade)]
  , caption = "Proportion of articles in Small m that are published in mainstream economics journals according to Coder 2")

Proportion of articles in Small m that are published in mainstream economics journals according to Coder 2
decade	Proportion
1990	0.5757576
2000	0.4761905
2010	0.6315789

kable(
  small_m[, list(`Proportion` = sum(Journal_status == "All Mainstream")/sum(Journal_status != "Unclear")), by= decade][order(decade)]
  , caption = "Proportion of articles in Small m that are published in mainstream economics journals according to both coders divided by the sum of unambiguous journals.")

Proportion of articles in Small m that are published in mainstream economics journals according to both coders divided by the sum of unambiguous journals.
decade	Proportion
1990	0.5625000
2000	0.4444444
2010	0.5116279

rm(Communities,dt_Articles)

4.5 Topics through time

load("/projects/digital_history/philo_and_economics/data/Communities_metho.RData")

plot_topic_thru_time(Communities, 1990, colors = colors_cluster)

## `geom_smooth()` using formula 'y ~ x'

rm(Communities)

Philosophy of Economics? Three Decades of Bibliometric History

Technical Appendix

Olivier Santerre, François Claveau and Alexandre Truc

October 2020