# Semantic Network of Organized Non-Religious Discourses
The interactive tool is available [here](https://balazka.github.io/Semantic-Network-of-Organized-Nonreligious-Discourses/#). Please note that the tool does not support mobile devices.

## 1. Theoretical background
Traditionally, sociologists neglected non-religion conceiving the so-called religious nones as a residual minority of individuals characterized by relatively similar worldviews (Vernon, 1968). Once a neglected minority, the population of religious nones grew considerably over the past few decades turning into the new majority in several western countries (Balazka, 2020). The progressive growth of non-religion eventually resulted in the re-discovery of this now salient phenomenon and in the institutionalization of non-religion studies (Bullivant, 2020; Lee, 2015). Nevertheless, for over a century, sociology of religion remained theoretically informed by a religion-centric understanding of the secularization thesis. Not only this influenced theoretical developments in the discipline, but it also oriented data collection itself. As a result, the existing survey data are often deemed inadequate to fully deal with newly emerging theoretical interests (Field, 2014; Houtman et al., 2012; Pasquale, 2007). However, recent developments in computational methods allow researchers to seek relevant information elsewhere (see Balazka et al., 2021). 

## 2. Data and methods
This interactive visualization is based on a collection of 7308 issues of British and American non-religious magazines published between 1881 and 2019. Two long-running magazines, one more radical (i.e., that self-identifies with atheism) and one more moderate (i.e., that self-identifies with humanism), with ties to a militant non-religious organization were selected for each considered country. The corpus includes The Freethinker (UK), New Humanist (UK), American Atheist (US), and The Humanist (US). The data are presented here in an aggregate form.

The magazines were split into sentences using spaCy to obtain smaller and more homogeneous units of analysis. The final corpus consists of 7,046,679 sentences. The documents were processed in stages. First, four multi-word tokens (i.e, "the_humanist", "new_humanist", "the_freethinker", and "american_atheist") were defined leveraging capitalization rules to distinguish the names of the magazines from other instances using the same verbal units. Afterwards, the text was lowercased and stripped while numbers and punctuation were removed. An exception was made whenever a hyphen appeared between two words to preserve meaningful tokens such as "non-belief" or "anti-abortion". Additional multi-word tokens, like "united_kingdom" or "age_of_reason", were then defined by analyzing recurrent bi-grams. The resulting sentences were then tokenized. All the tokens were translated from British English to American English and then lemmatized filtering out stopwords and terms with 2 characters or less. The final list of tokens was passed through a dictionary to identify typos and other recurrent errors in the text. All unrecognized terms with a frequency of 10 or higher were manually controlled and corrected when necessary. Terms with a frequency lower than 10 were instead removed from the analysis.

Each sentence was conceived as a semantic network of interconnected verbal units (Drieger, 2013). The Louvain method was used to calculate nodes' modularity (see Blondel et al., 2008), while Latent Dirichlet Allocation was used to identify the latent topics in the corpus (see Vayansky & Kumar, 2020). The layout was generated using a circle pack algorithm (Bostock, 2018). The visualization is hierarchical. The nodes are first clustered on the base of their primary topic, as identified by LDA, then clustered on the base of their modularity, and finally grouped by their eigenvector centrality. The nodes are scaled on the base of their eigenvector centrality and color-coded according to their modularity class. While color identifies densely connected communities of terms, both within a specific topic and between different topics, nearby tokens tend to be used more consistently to discuss the same latent topic. To simplify the visualization making it more readable, only the tokens with a frequency of 100 or higher are represented in the interactive tool. While all the represented nodes have a list of their top 10 connections in the information pane, a connection is represented in the visualization only if the two terms appeared in the same sentence at least 400 times.

## 4. Changelog
This section will cover the differences between the versions uploaded to Zenodo.

## 5. References
* Balazka, D. (2020). _Mapping Religious Nones in 112 Countries: An Overview of European Values Study and World Values Survey Data (1981-2020)_. Technical report, Prot. 8 / 07-2020, Fondazione Bruno Kessler. https://isr.fbk.eu/wp-content/uploads/2022/03/Mapping-Religious-Nones-in-112-Countries-Report.pdf
* Balazka, D., Houtman, D., & Lepri, B. (2021). How Can Big Data Shape the Field of Non-Religion Studies? And Why Does It Matter? _Patterns_ 2(6): 1-12. DOI: [10.1016/j.patter.2021.100263](https://doi.org/10.1016/j.patter.2021.100263)
* Bullivant, S. (2020). Explaining the Rise of 'Nonreligion Studies': Subfield Formation and Institutionalization Within the Sociology of Religion. _Social Compass_ 67(1): 86-102. DOI: [10.1177/0037768619894815](https://doi.org/10.1177/0037768619894815)
* Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast Unfolding of Communities in Large Networks. _Journal of Statistical Mechanics: Theory and Experiment_ 2008(10): 1-12. DOI: [10.1088/1742-5468/2008/10/P10008](https://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008)
* Bostock, M. (2018). Zoomable Circle Packing. _Dosegljivo: https://bl.ocks.org/mbostock/7607535_.
* Drieger, P. (2013). Semantic Network Analysis as a Method for Visual Text Analytics. _Procedia - Social and Behavioral Sciences_ 79: 4-17. DOI: [10.1016/j.sbspro.2013.05.053](https://doi.org/10.1016/j.sbspro.2013.05.053)
* Field, C. D. (2014). Measuring Religious Affiliation in Great Britain: The 2011 Census in Historical and Methodological Context. _Religion_ 44(3): 357–382. DOI: [10.1080/0048721X.2014.903643](https://doi.org/10.1080/0048721X.2014.903643)
* Houtman, D., Heelas, P., & Achterberg, P. (2012). Counting Spirituality? Survey Methodology after the Spiritual Turn. _Annual Review of the Sociology of Religion_ 3: 25-44. DOI: [10.1163/9789047429470_003](https://doi.org/10.1163/9789047429470_003)
* Lee,  L.  (2015). _Recognizing  the  Non-Religious:  Reimagining  the  Secular_.  Oxford University Press.
* Pasquale, F. L. (2007). Empirical Study and Neglect of Unbelief and Irreligion. In T. Flynn, _The New Encyclopedia of Unbelief_ (pp. 760-766). Prometheus Books.
* Vayansky, I., & Kumar, S. A. P. (2020). A Review of Topic Modeling Methods. _Information Systems_ 94: 101582. DOI: [10.1016/j.is.2020.101582](https://doi.org/10.1016/j.is.2020.101582)
* Vernon, G. M. (1968). The Religious "Nones": A Neglected Category. _Journal for the Scientific Study of Religion_ 7(2): 219-229. DOI: [10.2307/1384629](https://doi.org/10.2307/1384629)
