Interviewer: Yeah, thank you so much for taking part in this interview, PERSON. It's great to see you. And we've already just spoken about this, but just as a reminder, this interview is being recorded and it will be used, the recording will be used by myself and my student research assistants to write a transcript and then to anonymize it. And we will be using a local large language model to do that. And is that okay with you?

Interviewee 16: I understand and I consent.

Interviewer: Brilliant. So the aim of this research is to find out about open science practices in linguistics. And so my first question is linguistics being such a broad field, where do you situate yourself and your research within linguistics?

Interviewee 16: Well, mostly I would position myself in the realm of corpus linguistics. That's the methodology I'm most familiar with. I also dabbled in PROJECT, but corpus linguistics is really where I would see my expertise mostly.

Interviewer: Okay great and we'll start with your personal beliefs associations. Um what do you personally associate with open science and open science practices like? What springs to mind?

Interviewee 16: Well I guess the most important keyword here for me would be replicability. Sorry replicability difficult word. Um the the principle that whatever research is being conducted can be replicated by others and would lead to the same result.

Interviewer: Yeah, very interesting. And I've just spoken about open sciences. And open science is how I've introduced this study. But linguistics being a humanity, in the humanities, some people prefer to use the term open research rather than open science, or others prefer the term open scholarship, which is often thought to be broader to encompass open science slash research and open education. And I was wondering whether you had any thoughts on this terminology. I'm going to put the words in the chat so you can see them. Whether you think open science is suitable for linguistics or whether maybe another term is better suited to linguistics.

Interviewee 16: I have to be honest, I haven't really thought about these semantic differences. I would probably draw a line between the first three and the last one, open education, which probably goes a bit further than the scientific community. Open education to me sounds more like educational materials being available to the common public as well, which I guess is also true for open science. But open education, perhaps the information is structured or presented in a slightly different way. As regards the first three, I have no thoughts about any of the subtle differences here. To me, at first glance, they probably mean the same thing.

Interviewer: And you feel they all apply to linguistics?

Interviewee 16: Well, parts of linguistics is open. Certain journals, certain publishers and certain people and groups advocate open science in linguistics, but I would say that still a large part of linguistics is not fully open as I understand it.

Interviewer: Sorry, I meant could apply to linguistics.

Interviewee 16: It could, yes.

Interviewer: We'll speak about how widespread these practices are in linguistics later. Yeah, that's so true. I mean, I'm going to continue talking about open science, but I mean it as an umbrella term to encompass all of these things and feel free to use whichever term you think is more suitable. And now I'd like to think about your own experiences as a researcher. Which open science practices have you been involved in? Or have you been involved in any to start off with?

Interviewee 16: Well, I have tried to implement certain open science practices in my research. Mostly it's happened in an indirect way in the sense that I use corpora that are openly available to the corpus linguistic community. In some publications, we have also worked on scripts, programming scripts that were also submitted with the publication, but not necessarily with the intent of also making it available publicly. That would be the ideal scenario, I think, that in addition to the data, you also provide any scripts that you've written, any software that you have written, so that other researchers can fully implement this in their replication. Yeah, so part of my research has attempted to adhere to the open science principles, but I would say I'm not quite there yet with my publications so far.

Interviewer: And where did you learn about these practices? Because I mean, many linguists are not necessarily aware of them. Clearly, you are. And so where did you learn about these things? Or did anyone encourage you to get started?

Interviewee 16: Well, I wasn't really encouraged by any colleague directly, but I saw good examples of open science. I saw publications where, in addition to the text being open access, the data and the scripts and any additional material were hosted in a public repository. And so I was very intrigued by that and impressed by the diligence of making all of this available. It has mostly stuck with me because it's so uncommon still in linguistics, I would say. Sometimes it's even difficult to get the text of the article without paying money. So seeing not only the text available, but also any additional data, that is something that, yeah, I remember very positively about certain publications. And so that convinced me that this is the way to go. But I also understand that there are many hurdles along the way that make it difficult to change the existing structure, publication structure.

Interviewer: And we will now move towards the more general aspects. So thinking not only about your own experiences, but beyond, as far as you can tell, how widespread are open science practices in linguistics? Or you're welcome to think more specifically in your subdisciplines in corpus linguistics, for instance.

Interviewee 16: Well, I think that in corpus linguistics, the concept may be a bit more widespread than in other domains of linguistics, simply for the fact that corpora are meant to be published and be usable for the entire corpus linguistics community. Not all of them are, as we famously know. And usually when people compile a corpus, years of work is involved. And so it's clearly the people who compile such a corpus, they will try to get a certain amount of publications out of the data before they make it available to everybody else in the community. And this is understandable from a career perspective. But from a research perspective, yeah, it kind of it doesn't quite follow the open science principle to its fullest extent. I also am aware of certain publishers in linguistics that try to advocate for open access and open science principles. One particularly positive example would be the Language Science Press, which publishes all of their materials in a diamond open access. And I think this is a great shining example of what it could be, what open science can be. And so I remember these come to mind when I think of good open science projects in linguistics. I think there's still a lot of work that can be done, though. Many domains apart from publishing the text, it's not common that the data is published alongside it. And so I think there's a lot of things that can still be done to make linguistics more open and reproducible.

Interviewer: Yeah, and given that a lot needs to be done, what do you think could be done? And how could we increase this uptake of open science practices in linguistics?

Interviewee 16: Well, I think one thing that could be done is that maybe journals encourage their authors to publish the data on which the papers are based in a certain structured format so that others who are working on the same subject can download this data and cross-check whether the results that are published in the paper can be reproduced. I think that would be a huge step forward. This is, of course, problematic for various reasons, including copyright potentially or protecting the identity of informants. So this is very problematic, and it's not going to be easily done. But I think there are ways to do that without compromising anyone, any of our informants. So I think that could be done. Not just publishing the paper, but also the data alongside it. And then ideally, any software or code that was written to analyze it. I think that would be a huge step forward already.

Interviewer: Yeah. So you see the responsibility or the lever of action with the publishers, the journals?

Interviewee 16: I'm not quite sure who is responsible here. I mean, the journals clearly can implement this in their publication process. They can ask for it. They can, I'm not sure whether they should actually require it. That would be a very strict way to handle this. But maybe they can encourage and make it easier for authors to do this and to publish data alongside their paper. But the responsibility also, of course, lies with the authors, who can ask for the data to be published? They can ask the publishers to do that. Many publishing websites already offer some kind of tab with additional resources where it's very easy for the authors to store any additional material they want to add to the paper. So technically, I don't think that's an issue. It's really a matter of do the publishers and the authors want to do this? And how can it be done?

Interviewer: Yeah. You've mentioned open access publishing and diamond open access in particular. And that's obviously one way to go. How widespread are pre-prints and post-print publishing in linguistics, as far as you can tell?

Interviewee 16: It's certainly not as common and widespread as I perceive it to be in other disciplines like medicine or psychology. I think it's much more common there. I'm not aware of any preprint servers that specifically cater to linguistics and linguistic papers. I'm aware that many colleagues publish their preprint manuscripts on common academic social media platforms like ResearchGate, for instance, or Academia Edu. And so my students, for instance, they find papers and manuscripts there very easily. But not everybody's doing that. And it's also, it wasn't originally designed for that, if I'm not mistaken. These platforms were not originally designed as preprint servers. So they have other functions, these platforms. So I think it's not as common in linguistics to have a preprint a structured way of accessing preprint manuscripts.

Interviewer: Yeah, interesting. And you've mentioned some of the ways that you've been involved in open science practices. And my question would be, what would you need personally to be able to do more open science?

Interviewee 16: I think perhaps what I'm maybe not speaking from my personal experience, but I think what many people might hinder from engaging in more open access is maybe they don't see the benefit for themselves, really. There's clearly a benefit that everybody sees for the community. That's clear. But from a personal, maybe career standpoint, there might be reservations to engage in open science. A very simple example would be that you collect data for many years. And then you are basically encouraged or asked to publish that data, which you have carefully collected over such a long time, to publish it with your first one or two publications. And then others can benefit from the years of work that you have invested in this data. The benefit for the people receiving the data is clear, but what is the incentive for the person publishing this data? So why would they do that? I think we need to work on better incentives and to encourage this to be a practice that everybody or many people at least engage in so that we can all benefit from each other.

Interviewer: Yeah, that's really interesting. And yeah, I think my last question is a bit of a personal one, because sometimes I personally have the feeling that open science advocates in linguistics are often preaching to the choir. So we organize workshops and then people come who are already interested or convinced or already practicing open science in some way. And my first question would be, do you have that same impression or is it just me? And what could be done to reach out to more linguists?

Interviewee 16: Well, in response to your first question, I can't really speak from personal experience. I haven't attended many open access, open science workshops in linguistics specifically. Of course, I'm interested and I'm open to it, but I haven't been active in this particular community. So I can't say who is attending these workshops, but I'm pretty sure it attracts people who are already very open to the idea already. So that doesn't really surprise me. What could be done to attract others as well? Basically, this relates to my answer before, creating incentives for researchers so that open science practices also benefit their own careers, not just other people who are benefiting from your data. So I guess creating incentives would be the most important goal. And to get many more people on board. Maybe this can be done by showing how open access actually works, what can be gained from it. Perhaps by collecting best practice examples, showing how it's done and how you can benefit from it. And if people see the benefit for the community, maybe that will make them more open to producing these publications themselves and yeah I think this would be a maybe a doable first step in this process.

Interviewer: Yeah that was my last official question. The last one is simply if you have anything else you wanted to add on open science open science practices in linguistics or in the humanities more broadly.

Interviewee 16: I don't really have anything to add besides applauding your project here and collecting these interviews. And I'm very excited to see what the others have responded. And I'm looking forward to the results.

Interviewer: Same here. Thank you so much. I'll stop the recording.
