### Lesson 2: WHY: Benefits and Challenges of responsible Open Science: Why does it matter?

#### Introduction

In the previous lesson, we learned about foundational concepts that define Open Science. In this lesson, we address some benefits and challenges of working in the open.

Here we aim to present a take on the development of science that's not only focused on scientific results but also on the process of creation, and the stakeholders that constitute the community.

Stakeholders can be individuals producing scientific knowledge (i.e, researchers themselves), individuals consuming, applying and regulating scientific research (i.e., practitioners, general public, policy-makers, organizations, communities, etc.), and the larger scientific ecosystem (i.e., scientific journals, repositories, archives, etc.). We discuss more about the people who perform and benefit from open science - and how to support them - in [Lesson 3](lesson3-open-stakeholders.md). 

In this lesson, we highlight the various benefits of open science across multiple stakeholders, providing some examples that can be explored further. Further, challenges in adopting open science practices are explored. 

#### Benefits of Open Science

##### Quality of research

For researchers, a primary benefit of increased transparency and verifiability is that it allows readers and stakeholders to judge whether results presented are accurate (Chambers, & Tzavella, 2022) and, importantly, that the results are not produced by questionable research practices that lead to misleading or unreliable results  (John et al., 2012). Open science practices assure that various statistical estimates of a study (e.g., p-values, effect sizes) can meaningfully be interpreted (Mayo, 2017; Cummings et al., 2016). And allows others to scrutinize the analytic decisions of the researchers, such as whether the analysis was planned before or after observing the data (Nosek et al., 2018). This allows others to check if they can arrive at the same conclusion as the original research team, and facilitates stronger public trust and support (UNESCO, 2021).

##### Real world implications of non-transparent science

The Free Software Foundation Europe (FSFE) provides a compelling position paper explaining why transparency is important for science. When computers are used to produce scientific research, the code is considered a "method", much like in a lab research setting, a set of instructions for working with cells or agar plates might be a method. Peer-reviewed methods are an essential step in the scientific process. When these steps are not shared, no-one else can reproduce the work, or build upon it for future scientific endeavors. It also allows people to judge whether or not the methods are trustworthy. 

In this case study, the FSFE reminds us of a time when closed methods were not trustworthy. Volkswagen revealed it intentionally programmed its diesel engines to cheat during laboratory emissions testing. This meant that people drove these cars thinking they were trustworthy and safer for the environment than they actually were. In this case, the real emissions from the engines were more than 40 times over the legal limit in the USA! Had the code for the diesel engines - the "scientific methods" - been open, it is possible that this untrustworthy behavior would have been picked up on much earlier. [(Gkotsopoulou et al., 2017)](https://download.fsfe.org/policy/letters/20170105-horizon2020-position-paper.pdf).

##### Quality and diversity of scholarly communications

Furthermore, open science improves the state of scientific literature.  Scientific journals have traditionally faced the severe issue of publication bias, where journal articles overwhelmingly feature novel and positive results (Devito & Goldacre, 2018). This results in a state where scientific results in certain disciplines published scientific results may have a number of  exaggerated effects, or even be "false positives" (wrongly claiming that an effect exists), making it difficult to evaluate the trustworthiness of published results (Simmons et al., 2011; Nissen et al., 2016). Open science practices such as registered reports  mitigate publication bias, and improve the trustworthiness of the scientific literature. Registered reports are journal publication formats that peer-review and accept articles before data collection is undertaken, eliminating the pressure to distort results (Chambers, & Tzavella, 2022). Other open science practices, such as 📖pre-registration📖 also allows  allows a partial look into projects that for various reasons (such as lack of funding, logistical issues or shifts in organizational priorities)  have not been completed or disseminated (Evans et al., 2021) giving these projects a publicly available output that can help inform about the current state of the field. 

##### Not everything should be pre-registered

Pre-registration is the practice of registering your scientific study/experiment plans before you start the study. This helps to ensure that the experiment isn't changed part-way through if the results aren't the conclusion the researchers had hoped for, and can help ensure publication of "null results" which otherwise might not be published. 

Pre-registration is a good tool for hypothesis-driven science, when a researcher starts with a hypothesis, then proceeds to define steps (methods) to prove or disprove the hypothesis. Not all science is hypothesis-driven, though. Discovery driven science is more exploratory and doesn't usually start with a hypothesis. It may instead involve looking at existing data, or collecting more data, and trying to form conclusions based on the available evidence. Many domains perform discovery science, and generally these experiments and studies aren't suitable for pre-registration, since the exact direction of study may not be clear at the start of the research. 

Open Science is also a valuable tool to be used in the public sector. Movements like Public Money Public Code were started by people who believe in the value of having open research and data freely available to the population. Remarkable advances on the way we exerce democracy are also being empowered by science made on the open, software like Polis which leverages the concepts of Computational Democracy, empowers scientists to run statistics and machine learning technologies on opinions of millions of citizens. In other words, open science facilitates 📖citizen science📖. 

**Response to societal challenges**

As science tackles consequential topics (climate change, pandemics and global health, democracy and misinformation), the transparency and verifiability of science is more important than ever.  This is highlighted during the pandemic, where the creation of life-saving vaccines were spurred because the genomic sequence of  SARS-CoV-2  was placed in GenBank, an open access database (Zastrow, 2020). Open science allows for rapid, global access and action especially for shared problems too difficult to solve by any one team alone. 

Responsible Open Science is not only beneficial - it can also be characterized as an ethical imperative, especially for publicly funded projects.  UNESCO (2021), for example, writes "​​so as to ensure the human right to share in scientific advancement and its benefits, member states should establish and facilitate mechanisms for collaborative open science and facilitate sharing of scientific knowledge while ensuring other rights are respected" 

The recent years have shown the great momentum of open science, with a number of funders, regulatory organizations and governing bodies mandating open science practices across various disciplines across the globe (e.g. European Commission; UNESCO, 2021; National Academies of Sciences, Engineering, and Medicine. 2018 ), with more details about it in Lesson 4 . The practicing scientist of today and especially of the future needs to learn about open science and start applying it into everyday practice.

##### Less unnecessary repetition is better for study participants

Open science, in a way, also gives back to the communities that scientists hope to serve. Through open science practices, research waste can be avoided, such as unintentional and costly repetition of previous studies (Lusoli and Glenos 2020). In the human sciences, this also reduces participant fatigue in the long term. By maximizing what is learned from publicly available data, one does not need to test repeatedly especially on already vulnerable communities.  By "giving away" science, individuals, communities and organizations can more easily adopt research results to inform interventions for their own needs  without the knowledge being gatekept by the original researchers and organizations involved.  In this way, open science can facilitate strengthening the social and economic impacts of scientific results. 

##### Personal/career benefits 

Aside from accuracy, adhering to open science practices potentially offers personal career benefits to researchers themselves. Openly published research has a potential for greater visibility and impact by reaching larger audiences across the internet, leading to more citations and more like-minded collaborators and career/funding opportunities. (McKiernan et al., 2016). 

Open science practices can also  enable stronger collaborations, both within and between disciplines (Hormia-Poutanen, & Forsström, 2016).  The ease of access to open data brings new agents to the landscape allowing for broader and more diverse participation.  Through open science practices, such as pre-registration, one allows for a stronger research design because feedback from various collaborators and stakeholders can be solicited before data collection begins. Similarly, preprints allow for speedier feedback on conclusions drawn from the data once it is collected.

**Case study of a successful collaboration:**

_Mozilla, an organization famous for the web browser Firefox, also runs a community-driven  project called Common Voice. Common voice is an open crowd-sourced dataset of different voices and speech patterns, covering many different languages, accents, countries, and speech patterns. By making this data open and facilitating contributions from volunteers worldwide, speech recognition technology and text-to-speech technology is democratized and represents the members of the populace more equitably._

Practicing open science with transparency, collegiality, and research integrity do require development of a whole set of technical and transferable "soft" skills, which would be extremely useful for researchers in their careers both in academic or non-academic sector.  Some examples include digital content creation; information, publication, data literacy; communication and collaboration skills - we will come back to it in the bonus section of lesson 5.  Therefore, It is important to have the training and mentoring widely offered to the researchers.

_Short on time? Make sure to read the top-ten reasons to do open science at the end of this lesson for a quick TL;DR summary._

#### Challenges in Open Science

However, open science also comes with its challenges. Doing open science requires some extra effort from researchers to start and maintain, but its long-term benefits include a great overall increase in research efficiency, integrity, and public trust. For example, putting your code in the open will probably mean that some adjustments must be made, and sharing it with a community will demand that you choose how your contributions can be used by others.  Sharing data can imply extra work and planning; however this organization and widespread discovery can greatly improve science and confidence in it. We will see more details on code sharing and licensing in the "How" lesson  5.

In this lesson we focus on the challenges of your work, and the consequences of sharing -  and in some cases, oversharing.

##### Not everything should be open - don't overshare without consent! 

In order to practice responsible Open Science, careful attention should be given to how data is anonymized and how sensitive information is removed from it in order to safeguard people's identity and to prevent various harms stemming from breach of privacy.  In recent history, we have seen many cases of how the misuse of data and illicit means to collect it is harmful to the population. Scandals like the Facebook–Cambridge Analytica, and outrageous services selling very personal parts of users' lives without their knowledge and full consent are far too common. 
Preparing documentation, using standards, and creating metadata takes time and effort

Additionally to treating users' data ethically,  often further work is required to make research outputs  not only publicly available but also understandable and accessible to various stakeholders. This means for example, that codes to be shared are understandable and properly documented. This might mean to have a testing system in place, make use of a  distributed version control software and a CI/CD pipeline. If you're unfamiliar with any of these terms, don't worry! They will be covered in the "Open Software" module. (Maybe this last sentence is a cute little character with a balloon)

Besides caring about code, if the project utilizes data and that's being open sourced, it might be necessary to also have documentation that adequately describes the data set's contents, nature and layout. This type of "data about data" is known as "📖metadata📖". It might also mean to tweak the formatting of the dataset to fit a specific pattern agreed by the broader community - this is known as using community-agreed data standards.

#### Open community members don't always agree with each other

Other than the more technical aspects of producing Open Science it's also important to keep in mind the societal aspects of the project. While interacting with the community can be one of the most fulfilling things about Open Science, it might also be a source of disagreements about the direction of the project or how it should be used. That's where licenses and codes of conduct come into place. By explicitly setting out rules for the community interactions and use of resources, licenses and codes of conduct are useful to both protect the maintainers and their vision of what the original project and the 📖forked📖 projects should comply with.

#### Case scenarios in open communities

As you saw in the last lesson the story of Open Software (which builds the foundation for Open Science) is vast and at times different open values can conflict deeply. Two particularly relevant movements that helped to shape our ideas and actions in Open Science today are the Open Source and the Free Software movement.

The Open Source Initiative, an organization that advocates for Open Source, argues that Open Source code can't "discriminate against persons, groups, fields or endeavors", the Free Software movement affirms that "everyone should have the freedom to run the program as they wish, for any purpose ". Even though these maxims might sound very encompassing and welcoming there are several critics of the carelessness that these movements have been treating both maintainers and users of Open Source, as well as their gullible negligence on how powerful a tool code is and how it can be used to do evil.

Speaking about doing evil, the Open Source Initiative addresses this problem with these exact words "Giving everyone freedom means giving evil people freedom, too". Recent movements like the Ethical  Source and the First Do No Harm movements have been questioning the broadness of paradigms in which open resources are allowed to act, imposing ethical restrictions to the use of software through the use of licenses. There are also cases where the project maintainers took the lead and made their own licenses, such as for the data format JSON.

Examples of open science and open source that have been used for unintended purposes. 

- ICE uses Chef-sugar (an open source project) [1] - an open source project being used by immigration enforcement authorities
Illegal use of Elasticsearch branding by Amazon [2][3]
- All the "what's bad" essays on Stallman's website
- 📖Data Sovereignty📖, indigenous rights, and parachute/helicopter research:  when marginalized people share their data, sometimes privileged researchers re-use the data without fair credit or funding reaching the original data creators.

Further, science that is just "open" does not necessarily mean that it is of high quality. However, the transparency and verifiability that open science affords, makes readers and various stakeholders able to independently judge the trustworthiness of research products. 

#### Cultural barriers: not everyone wants to change, and institutions often move slowly

A further challenge of adopting open science practices are institutional barriers to the researcher or practitioner. While one might be interested in adopting open science practices, they might lack support from their department or project supervisors  and open science practices might not be given the budget, resources or time in a project cycle. Institutions might also not recognize open science practices in recruiting, training or promoting in the organization. These lack of incentives within organizations present difficult barriers to the adoption of open science. 

While there are many challenges to the adoption of open science, we believe that its various benefits and its ethical imperative to the self and to the scientific communities, citizens and policy-makers outweighs the cost of barriers. In addition, recognising the barriers and places where caution needs to be taken provides a first step towards resolving them. 
 
#### Summary

Open Science provides benefits not only to society but also to the individuals who perform it. Walking the line between responsible appropriate sharing and irresponsible oversharing requires diligence but the path and the results of science made in the open are very rewarding to all its stakeholders.

#### 10 Reasons to practice open science responsibly: 

##### responsible Open Science…
- … (including availability of data, code, materials, and early results) accelerates research broadly and greatly.
- … generates transparency and public trust and support
- … fosters working across and engaging multiple disciplines, or "convergent" science. 
- … brings innovation through using big and aggregated data and information 
- … supports public and community uses of science: also known as community science, participatory science, or citizen science.
- … helps fight misinformation and disinformation
- … is intentionally and thoughtfully inclusive practice
- … supports the key role of  science in addressing  major societal challenges in the 21st century (including climate change, sustainability)
- … makes your research more efficient and impactful and provides credit broadly
responsible Open Science is the new normal,and regulatory and governing bodies are reaching a consensus toward pushing it).

#### Questions/Reflection:

- Why are responsible Open Science practices important to a researcher's profile? 
- How can a researcher benefit from responsible Open Science practices? 
- How does society benefit from responsible Open Science? 
- In this lesson, we learned that responsible Open Science often takes time and requires diligence and dedication of researchers. Can you explain how and why?


