Interviewer: There we go. The recording has been started. Thank you so much for agreeing to this interview, PERSON. We've just discussed the form of consent, but as a reminder, this interview is being recorded and it will be transcribed by myself and my student research assistants using the help of a local large language model. Is that okay with you to get started? Yes.

Interviewee 13: Thanks for inviting me.

Interviewer: And this project is about open science practices in linguistics. And of course, linguistics is a very broad field. So my first question would be, where do you situate yourself within this broad field of linguistics?

Interviewee 13: Yeah. So, well, I would probably define myself as a corpus linguist, first of all. Although the research that and the teaching that I do kind of involves corpus methods apply to language learning, language teaching, second language acquisition and PROJECT. So these are the main areas. I'm also interested in methodological aspects of corpus linguistics and anything related to PROJECT.

Interviewer: Great. And so we'll start with you as a researcher and your personal associations with open science. So what do you associate with open science? Like what springs to mind when we speak of open science practices in linguistics?

Interviewee 13: Well, I guess there's a number of concepts, a number of ideas that come to mind. So open science kind of makes me think of collaborative research within a community of practice. So being able to collaborate with other colleagues who are willing to share what they're doing and the tools that they are using or the data sets that they are using. So that basically within this community of practice, well, knowledge can be built in a sort of a collaborative way, building on what has been done previously. And so learning from previous, perhaps mistakes or difficulties and kind of helping each other to improve practice and in the end, knowledge.

Interviewer: Oh, and so I've been talking of open science, but traditionally linguistics is considered a humanity and some people in the humanities prefer to speak of open research rather than open science. And then there are others who prefer the term open scholarship, which is often thought to be sort of broader to encompass also not just open science slash research, but also open education. And I was wondering whether you had any thoughts on this. Do you think that we can speak of open science in linguistics or is one of these other terms more appropriate? What do you think?

Interviewee 13: Well, this is really interesting. I've never thought about the word science as being not the most appropriate one to be used when we talk about research practice in linguistics. Probably because, well, linguistics, applied linguistics, corpus linguistics, to me, can be characterized as science or as part of social sciences, for example. So even if it can be linked to humanities, still we use research practice that links to scientific research in terms of methodologies, And also in terms of, yeah, in terms of methodologies, data, some thinking of, for example, some research practice in linguistics. That kind of involves interdisciplinary collaborations like psycholinguistics research and so on. So I would consider it a scientific subject. And so I've always thought open science as a concept applies well to linguistics. As for the other terms that you've mentioned, well, open research, I guess, open research, open scholarship, open education. Well, I guess open research and open scholarship perhaps might be broader terms that could apply to all disciplines of research. In general, and perhaps might be preferable just to be more inclusive. Open education, I see it, well, I'll just say that I don't know if there's a clear definition of these terms. I'm just thinking about my own definition. And in this case, if I think about open education, there's something more attached to this concept. So I feel like open research, open science is more to do with research practice while open education also has to do with the teaching that we do informed by our own research practice. Although then the teaching that we do, is could be considered part of the dissemination activity. That is the endpoint of research in any case. So they're very closely related.

Interviewer: Yeah. I mean, there are no, I think there's no consensus on these definitions. So I'm really asking because I'm curious to see what people associate, different people associate with these terms. And I will continue to use the term open science in this interview, but I mean it very broadly to include all of these things and feel free to use whatever term, you know, you prefer. Yeah, but we'll now think about your own experiences of open science and open science practices. Do you take part in any or have you in the past? And if so, which ones?

Interviewee 13: Yeah, so I have taken part in research projects that kind of embraced open science principles, more or less, because I think it's not a yes or no thing. It's more of a continuum. So I'm thinking, for example, of research that involved the analysis of corpus data that were accessible, but up to a certain point, for example, because of perhaps copyright restrictions. So in this case, there's obviously a limitation in terms of the possibility of sharing the data, which, for example, I'm thinking of a project where we could share the data, the corpus, through an application that was internally developed at my university. So other researchers could freely investigate and analyze the corpus data that we prepared, we developed, but they weren't allowed to actually download the full texts because of copyright issues. In other projects that I worked on, we were able, well, with colleagues, we were able to actually make the full data set available or sometimes part of the data set. So I think it's important to obviously kind of engage in open science principles as much as possible, but sometimes they have, well, there are limitations, as I said, for example, into copyright and my own practice, has also been characterized by this situation. So what else? So yeah, sharing data or data sets, sharing scripts for the analysis of data. I'm thinking, for example, of R scripts. And I've never preregistered a study so far. And I know it's getting quite popular in our field, but I'm still trying to think about where my attitude, where my position is about this. Because of course this is not just sharing the idea of a project in advance, but it's also a matter of committing to a research project without limiting the possibility of modifying it slightly on the way. Again, I've never pre-registered a study and my knowledge of pre-registration is quite limited, so I might be wrong. And yeah, I think these are, so this is my experience more or less, mostly. Sharing as much as possible within some limitations. For example, for my PhD, I could not share the dataset because it was of a corpus based on PROJECT and, well, they didn't give consent to share the full dataset.

Interviewer: And have you shared sort of secondary data? I mean, you mentioned sharing the encrypted version of the corpus via a platform that's obviously quite involved. You need a lot of technical knowledge to be able to do that. Some corpus linguists sometimes share sort of secondary data, like word frequencies or whatever is analyzed in the next phase of the project. Is that something that you've experienced before?

Interviewee 13: Yeah, that is something that I tend to do almost most of the time, if not all the time. I think that's something that is also helpful because, well, sometimes if you share the data that you're using for the analysis, the secondary data that you're using for the analysis, such as quantitative measures that you've obtained from a corpus and so on it's a good opportunity to get a bit of a sanity check from colleagues or other people who want to perhaps replicate your study. So it's actually quite helpful in in that respect.

Interviewer: Yeah and and my next question would be like what's your motivation for engaging in these practices and like what? How did you learn about them?

Interviewee 13: So the motivation is, as I said, to kind of contribute to some sort of collaborative knowledge building and sharing. And so, and also the motivation is linked to my own experience when I was a PhD student of asking for data from previous studies to kind of replicate an analysis and try to understand what was happening behind the scenes. And it was quite frustrating when authors perhaps replied with a negative answer while it was quite motivating when they actually agreed on sharing the secondary data. So that was exciting. And so I thought this is something I also want to keep in mind in the future. So when people share data or methods in a way that makes their research replicable is actually very useful and not just for collaborative knowledge creation and building, but also for kind of, well, to give other people an opportunity to learn as well. And so the first question was, what was the rationale?

Interviewer: Yeah. And the other is like, how did you find out about these practices?

Interviewee 13: Oh, right. Yeah. Well, during the PhD, I was joining some research groups in my university with other PhD students and so on. And every session was on a different topic. And there were lots of sessions on open science principles where we discussed pros and cons, our experience, and also shared some tips. And so little by little, I started knowing about this and then collaborating with people with more experience. So learning from their example. And doing some readings, of course. And, yeah, these were the main things.

Interviewer: Yeah, we'll now move away from your personal experiences and try and think of linguistics more broadly. Or you're welcome to think about corpus linguistics or applied linguistics, whichever subdisciplines you feel you're most familiar with. And my question would be, as far as you can tell, how widespread are open science practices in linguistics or in these subfields?

Interviewee 13: Perhaps not enough. I mean, I know it's a trend that is starting to become more and more widespread, but it's often the case that you read a paper and then you want to kind of look at what happened behind the scenes and you look for the for details of how they did the analysis, because perhaps in the paper, nothing is very well, something is not very well specified because of word count and so on. And then, so you think, okay, there must be some additional materials or they must have uploaded their additional materials on a repository or somewhere. And then you find out that, no, that's not the case. And, or, yeah. And so it's, I guess it's starting to become something a bit more common, but probably it depends on the subdiscipline within linguistics. So, for instance, I know colleagues in my department that do research linked to kind of medicine or like psychology, where open science, principles of open science are more widespread. And so they're very used to following them. But other colleagues that perhaps work more closely to areas of linguistics that perhaps could be considered part of humanities and a bit less scientific. And in that case, they're not so used to sharing all their data and methods, especially if they are using qualitative methods. Well, but maybe, no, maybe this is not really true because it applies to both qualitative and quantitative methods, I would say yeah yeah.

Interviewer: But you still see some differences between subdisciplines.

Interviewee 13: Yeah absolutely so. Um yeah if I'm thinking, well, and I and again perhaps it's also because some subdisciplines kind of take some input from other areas of research that are stronger using open science. Such as, as I said, medicine, or I'm thinking about, for example, colleagues in my department that do research in phonetics, phonology, and so that's sometimes linked to, well, it can be perceived as more of a scientific area of research in linguistics, while other areas of research, for example, what is connected to language learning, language teaching, well, yeah, language teaching, language testing. Sometimes there's copyright issues there that, again, don't allow you to share your data. While, for instance, if you work in the area of second language acquisition, language acquisition, that's something that, well, the way I perceive it, and this might be totally biased, is that they are more used to following principles of open science and sharing all the data and research techniques and practice.

Interviewer: Yeah, that's really interesting. And in the areas where there is less uptake or even where there is some uptake, but as you were saying, still very common to have papers that don't share or that don't share the detailed methods or the data. What do you think could be done to improve this uptake?

Interviewee 13: Well, I guess there are a number of things that could be done. So first of all, definitely kind of organizing or having more opportunities to discuss these topics and this practice or practices. Perhaps organizing some kind of special collections or publications that do follow principles of open science and could set an example. And perhaps also kind of involve journals and editors within the discussion itself, because sometimes that's an important starting point. So in, well, I guess it's in areas of linguistics which already adopt principles of open science, it's often the case that the journals where they publish their research actually ask for supplementary materials and to be uploaded in a repository. And so that's why it's probably more common. And perhaps it might be helpful also to kind of organize some kind of support for researchers in terms of training. And I know there's resources online that could be accessed. So yeah, making use of what is already there and perhaps maximizing it.

Interviewer: Yeah. Yeah, it's interesting. The first thing that you mentioned was talking, sort of building communities and exchanging about open science practices. And a very personal observation is that sometimes I have the feeling that open science advocates, if we can put it that way, often preach to the choir, you know? And I'm wondering how can we reach out to linguists who maybe are not convinced or maybe just not aware of open science practices?

Interviewee 13: I'm not sure. That can be tricky. Well, I guess even through conferences, for instance. And I guess also kind of trying to, I wouldn't say praise, but I can't find a better word. Yeah, let's say praise. It's not the right word, but I can't think of anything better now. So trying to praise colleagues that actually follow these principles and kind of use them as examples. And perhaps because I guess when colleagues don't actually agree on sharing their data or methods and so on, It's probably, so sometimes it's a matter of actually sensitivity of the data and that's a completely different thing. But sometimes it's also a matter of being a bit worried about perhaps sharing something which could contain some errors or might be a bit inaccurate. So basically being worried about criticism. And I guess just trying to reinforce the idea that like the community of research is actually a supportive environment. Or at least that's my experience. Perhaps I've been lucky. But yeah, I guess it's something that we should probably work. Well, everybody should work towards doing a little bit. And yeah, and the way to reach out to linguists might probably be through the usual channels of communication that we know are kind of used as a point of reference. So I'm thinking of popular mailing lists or conferences, or as I said, journals that might also offer this kind of support. I wonder whether they could also offer some support in terms of helping with sharing data and methods in the best way possible. So, yeah.

Interviewer: Yeah, that sounds really interesting. Yeah, I think we're slowly reaching the end of my sort of set of questions. Maybe the last one would be, what would you need personally to do more open science? Like what would help you?

Interviewee 13: Well, a number of things. There's not just one thing. So different ideas come to mind. First, perhaps it might be interesting to have a bit of access to some kind of mentorship program. So I'm an early career researcher, so I feel like I have a lot to learn. And I do have access to a mentorship program through my university, but it seems like there's never enough time to talk about certain topics such as open science, because since the mentorship program is organized within my institution, it's more to do with and getting familiar with procedures and documents and things that are basically just for my institution. So I wonder whether there could be an opportunity to organize something similar between colleagues from different institutions perhaps. So that might be one thing. Another thing is, yeah, I mentioned earlier on that. I think what kind of makes people a bit reluctant in sharing their practice, research practice and instruments is that they might be a bit worried about being criticized or perhaps that other colleagues might use their data to do research that perhaps they're planning to do next. So I guess I've lost my, I've lost the idea now, my idea. Yeah, if people are worried in this way, I guess, I wonder whether there could be some sort of an additional support that could be given to researchers, perhaps early career researchers, or people that decide to engage with open science principles for the first time, to do it in the best way possible. So some kind of help in making sure that what is shared is accurate so that researchers don't need to worry about it anymore. Or not so much. What else? And perhaps getting access to some sort of testimonials or getting the chance to talk to people that regularly follow this practice and can kind of share tips and advantages. And so these are the things that might help me the most.

Interviewer: Yeah, that's really, really nice ideas in there. Thank you so much for sharing. Yeah, that was the end of my official questionnaire. But is there anything else that you wanted to add on open science practices in linguistics or in the humanities for that matter?

Interviewee 13: I don't think so. Or maybe just the fact that I think every time I've discussed open science principles or read about it, it's always been linked mainly to data methods and perhaps this is just my into my own literacy regarding the topic which might be limited, but I wonder whether we could kind of discuss a bit more practice linked to tools and the development of tools. Well, I know that when we talk about analysis and methods we do share. We do tend to share also scripts and codes. But what I'm thinking of is linguistic tools and software that sometimes is is available through either an online platform or a desktop application, which doesn't allow you to actually understand what happens behind the scenes because there's no clear explanation of what happens, what processing the data goes through, and the actual code that is used by the software is not shared. So, that is something that perhaps might also be included more in discussions about open science.

Interviewer: Yeah, really interesting. I found out recently, I mean, I was aware of the problem, but I wasn't aware that it was that bad, that the englishcorpora.org and also the Spanish and the Portuguese equivalent that I'm sure you're familiar with, which are used a lot, right, these platforms, the results change over time, even like when the corpora are supposed to be fixed. If you like rerun the analysis with exactly the same query and the same like setup, you're sometimes just get different results and also there are like different functions that you can use to obtain supposedly the same result but that turn out to give different numbers. Like I would assume that it's actually the same query that's happening, but you can get different results on the same day.

Interviewee 13: Yeah.

Interviewer: It's really not reproducible and clearly something's gone wrong in the script at some point, but I have no way of knowing what's going wrong and where it's gone wrong. And I don't know which searches were correct, you know?

Interviewee 13: Yeah. No, absolutely. That's a good example.

Interviewer: I mean, it's a scary one. But yeah, we don't know which version is being used, what's been updated.

Interviewee 13: Yeah, that's something else like the version number of even a corpus or a tool sometimes is not clearly reported, or sometimes it's not even specified by the developers. Yeah, exactly. And in corpus linguistics, if you think about even tokenization. Yeah. So a simple procedure like tokenizing your corpus can be done in so many different ways that then result in different results. Yeah. So, yeah.

Interviewer: Yeah. Crazy. Well, thank you so much. I'll stop the recording there.
