Interviewer: There we go. Thanks again for agreeing to this interview, PERSON. Just as a reminder, you've signed the form of consent and I have it, but the interview is being recorded and the recording will be used by myself and my student research assistants to generate a transcript. Which will then be anonymized. And it's only the anonymized transcripts that will be shared. So the recordings will stay with us. We will be using a local large language model to help us with the transcription process. Is that okay with you?

Interviewee 18: Yeah, that's okay with me.

Interviewer: Great. So this project is about open science practices in linguistics. And so my first question is, linguistics being such a broad field, where do you situate yourself and your research within linguistics?

Interviewee 18: Um, so I work on PROJECT mainly. So that, that puts me somewhere definitely within phonology, phonetics. Um, I also work on, on PROJECT issues, on pragmatics, some semantics, so some theoretical linguistics, but mostly, like on a day to day basis, let's say speech science. So we handle recorded speech.

Interviewer: Great. And we'll begin with your own personal experiences. What do you personally associate with open science and open science practices? What springs to mind?

Interviewee 18: Like free association?

Interviewer: Yeah, your own associations.

Interviewee 18: Okay, it's definitely something that you see more and more that's becoming maybe not even like, in some cases you can't even opt out anymore, which I think is good. Like if it's about, let's say you want to publish an article that you're pretty much encouraged, at least encouraged to also publish the experimental materials. If not, like the actual data, if you can publish that. And it's useful when other people do it because I can check their scripts and see, okay, that's how they did it. Or I can check the experimental materials because maybe there's like, there was an oversight or maybe they had a great idea. That's, you know, something that you can take inspiration from. So, yeah so on that level like let's say supplementary materials for for articles. It's it's convenient if everyone does it let's say it can be like from the perspective of someone who has to prepare. It can be annoying if you didn't like immediately sort of do the entire workflow with the expectation okay I'm going to need like a document that's going to go on the OSF at the end. If you do sort of keep it in mind all along, then it isn't that much. But if you do remember at the end of the pipeline, and I have to do that. It can be mildly annoying, but not that annoying. Yeah. And yeah, so that's really my primary association when it comes to open science. It's about sort of making at least your experimental methods, if not the data, publicly accessible.

Interviewer: Yeah. And we've spoken about open science, that's how I introduced the study and the project. But linguistics being a humanity, some people in the humanities prefer to speak of open research rather than open science, or others prefer to speak of open scholarship, which is often understood to be a broader term that also includes open education. And I was wondering, I'm going to put these terms in the chat so you can see them. And I was wondering whether you had any thoughts on this. Do you feel that open science is suitable for linguistics or are there more suitable terms as far as you're concerned?

Interviewee 18: So when you introduced or started introducing that, I immediately thought, why wouldn't open science be applicable to linguistics? But when you say, okay, open education, when it comes to you know, thinking about teaching, I can maybe see that some people might be happier with something other than open science. Although I think of the four options that I see on the screen right now, so science, research, scholarship, and education, I think... I think really science is like the most general, maybe research, at least like scholarship and education are, in my opinion, more narrowly constrained. So yeah, I would say like in linguistics, I think open science fits everything. And, yeah, I guess that that's my answer. It might be that like some corners of, let's say, applied linguistics might be happier with something, one of the other options, in which case, as long as it's open, you know, that's the important bit, I think.

Interviewer: Yeah, I'm going to continue using the term open science, but I mean it to encompass all of these things and feel free to use whatever you think is most suitable, of course. And now I'd like to speak about your personal experiences of open science, as in which practices have you been involved in so far?

Interviewee 18: So I'm going to start with the rarest, most rare. I don't know if that's the comparative, but the one that I've only done once, which was a pre-registration. So we've pre-registered one paper, which we haven't written yet, but we have run the analysis. To give a quick impression of my thoughts, it's helpful to already be thinking about what you're going to do with the data before you even have it. It turns out that you miss edge cases. But in general, I think it's helpful. So why have I only done it once? I don't know. I have regularly put experimental materials, like statistical supplementary analyses up on OSF, so the Open Science Foundation. I've done that. I haven't actually published any of, like the primary data itself because it's speech recordings and like as a general rule of thumb, most of our participants don't opt into, you can do whatever you want with it, with my recording of my voice. So I err on the side of not making like the actual audio recordings accessible. In terms of open science, I guess I have a GitHub account, but all of my repositories are set to private, mostly because of the data sharing issue that I've mentioned. So that's for if I know that I'm going to be working on a data set for a while and I might like change annotations or something like that, then I make sure that it's... That I use Git as a versioning tool so that I know, okay, at this point I made these changes for that reason. Which is... At least potentially open science. If at some point I say, OK, here's the version history of this data set, then people could see, OK, these were the changes that we made. And they could also undo them if they think, OK, that was, yeah. OK, I guess off the top of my head, that's about all I can think of.

Interviewer: What about open access publishing?

Interviewee 18: Oh, OK, yeah. That's actually something that I'm mildly proud of, although I think I just lucked in of my most recent journal publications, they're all open access. I think conference proceedings, I have to go back almost a decade to get one that's not open access. It's, yeah, I personally love it if I can access an article, first of all, at all. And secondly, if, let's say, I'm home, I don't have to log into the VPN to get it. That's just about the convenience, but also, obviously, about the accessibility and also accountability towards the people who pay for our jobs. So in that respect, I'm very happy that most of my publications, especially in the recent ones are open access. I don't make it like a condition, you know, I don't say okay, I don't publish if it's not open access. But recently, it just has luckily panned out that they were all open access.

Interviewer: And for the few articles or book chapters maybe that you have that are not open access, have you published post-prints? Or what about preprints? Is that something that you do?

Interviewee 18: I recently published a preprint, yes. That was mostly because we needed... So for the PROJECT application process, the minimum requirement was that it's officially a published preprint. So we did that. I forget the, I remember the journal, but not which publishing house, but they have their own preprint publishing platform. And so we did that, which I think then also allows to connect the finished paper to the preprint if at some point it's actually published. So I've done that and it was surprisingly easy and kind of worthwhile. In terms of like the articles that are not open access, I'm just going to say without looking up that they're on my website and you can download like the author version as a PDF. If it turns out that they're not on there, then I have something to do for after the interview. But yeah, I do want them to be accessible in some way.

Interviewer: Yeah, and I was going to ask where, and the answer is your personal website.

Interviewee 18: Yes, and also my personal website at the institute that I work for. I don't actually have a university external personal website, which I guess is also something that I should work on soon-ish.

Interviewer: Yeah, that's really, really interesting. You've been involved in all kinds of practices already. And my next question would be like, where did you find out about these practices or who or what encouraged you to pre-register, to publish your data, your methods?

Interviewee 18: Um, I honestly can't tell you because I don't remember who first taught me about, pre-registrations. I think it might probably have been during my, masters, days. Uh, and I think it was probably a psycholinguist, because people in that field have done it for a while. Um, open access, I don't know, I think it's kind of recent development that it's that it's really like, let's say that easy, or possible at all to actually have open access publishing. So I don't know if I was aware of like the distinction already during my student days, or whether that came later, and I'm not sure if okay, actually, I just remembered a couple of persons. So one is here in CITY, PERSON. She's very passionate about open access publishing. But I've obviously only met her when I came to CITY, but someone that I knew before when I was still in CITY, PERSON of, among other things, INSTITUTION. And he's very, very, very, very, very passionate about open access publishing. And so it's a safe bet that I heard about it from him or one of his PhDs. So really specific persons who are very passionate about the topic.

Interviewer: Yeah. And thinking for instance about this pre-registration that you've recently done, what encouraged you to pre-register? Was it like your own idea? Was it a requirement from the journal or a grant?

Interviewee 18: No I'm, so, that's the publication that... You are going to anonymize the transcripts, right? 

Interviewer: Yes. 

Interviewee 18: Yeah okay. So I can just say a name that PERSON is involved in and I think it was almost certainly her idea because the other people on the author list, including myself, don't generally pre-register. I think it was also because it is ultimately going to be a study that's primarily of interest to psycholinguists. It involves prosody, but it does have a psycholinguistic research question and so maybe we just thought okay if we want to publish this in one of the relevant journals. Then it was our impression or at least my impression as someone who's external to psycholinguistics usually that it might just be a requirement that if you come to a journal and say hey we didn't pre-register this they say sorry we can do that. I don't know if that's true but I think that was the thinking to just be on the safe side and Like I said, it doesn't literally cost anything. It's still possible that, I don't know, deviating from the pre-registration might be something that gives people a lot of alarm, which we might have to do because we didn't really think through the statistical analysis. Um so in the details it's not like, okay we are going to throw a coin and see if it's significant or not but like in the details so it ultimately was like a thing. Okay we know we can start the study now let's quickly pre-register it. Oh, and I should also say, in the lab where we run our experiments, so this was an PROJECT study and it was run in the INSTITUTION, we do have to apply for running an experiment, which is mostly so that the person who runs the lab knows what's being run in which lab. But the form that we use heavily encourages that you just put in a copy of the pre-registration that you already did. Because otherwise, like for the application, you have to describe all of that anyway. You would then do it in LANGUAGE or whatever and do it maybe sloppily or something, or you just put a copy of the PDF that you... Put on aspredicted.org or something, and then you already have that part of the application for the experiment covered. Which is an encouragement that I ignored for many years. I did it in LANGUAGE and sloppily. But it was nice for once to be able to apply for an experiment and say, here, look at the copy of the pre-registration. So yeah, our lab does encourage actually pre-registrations.

Interviewer: Yeah, that's really interesting. I hadn't heard of that before. So thanks for sharing. I will now try and move away from your personal beliefs, experiences and so on. And let's try and think about linguistics as a field. But you're welcome to think of the sub disciplines within linguistics that you're most familiar with. And my question would be, as far as you can tell, how widespread are open science practices in linguistics at the moment?

Interviewee 18: I mean, it's my impression is that it's sort of becoming more and more common, which obviously presupposes that it's not like universal or wasn't universal 10, 15 years ago. But I think at this point, if I see a recent journal publication, and let's say it involved actual data, as opposed to just somebody's intuition, then I would be very, very surprised, regardless of the journal, if there isn't some sort of link to, let's say, OSF or even on the journal's website itself to some sort of supplementary material. I think at least in the subperiods that I move in, so PROJECT, it's, I think, near universal, at least to share the scripts for data analysis, because like I said, in some, or in many cases, audio recordings are difficult to share. And in some cases, like when people work on so called clinical populations, they are literally impossible to share. For good reason. So yeah, I would say in in like my subfield, it's very common, if I compare like to adjacent fields. I remember during my PhD thesis, I did work on PROJECT. I shouldn't have thought for that long about that. But anyway, I knew that there was or is a corpus that someone built in CITY. There are publications about it. And I know that some people have access to it, but I didn't. It was really annoying. It was a corpus that was definitely hard to share because it's multimodal. So it has speech recordings, video recordings, I think like gesture, you know, like motion tracking and stuff like that. And it's 15 years old at this point. So they were really ahead of their time. Um so it I can see that they can't just put that on the internet or something like that. But yeah maybe like the transcript or something like that so that I could have at least gone and searched for you know does the thing that I’m working on occur in that corpus? And then maybe I write an email about getting local access to it or whatever. But as far as I know it's still like sort of hidden away and this sounds as if I'm, if this sounds as if I'm bad-mouthing like specific people like I can see why they couldn't just make it readily accessible and it was also like funded by a project and then you know how it is. At some point the project funds run out and yeah. So, yeah, there are some areas where I actually thought, oh, I wish this data was more openly accessible or accessible at all. But I think with recent, also newly built corpora, it's my impression that especially corpus linguists are very, very sort of aware that that's something that's very useful, making their corpora accessible to the extent that it's legally possible.

Interviewer: Yeah, sure. And you've partly touched upon some of these points already, but my next question would be, what are some of the, or are there any, if so, which ones, specificities about linguistics that need to be taken into consideration when we try and apply open science practices that come from other fields?

Interviewee 18: Yeah, I think like the inherent well, let's say, identifiability of the data and sort of the being able to sort of connect it to specific persons. That's not, like, unique to linguistics, but unique to, let's say, human sciences, like the humanities. That's something that probably, like, someone who's doing physics or something like that doesn't have to worry about because, you know, it's numbers about atoms or whatever they do. Yeah. In our case, like, and even if it's not like speech recordings or video recordings, even if it's like just okay I think this grammatical structure is good or it's terrible. It's still something that tells you something, even if it's not like maybe not like critically important about the person. I think that's something that let's say, yeah, people in STEM don't have to worry about that much. They probably also have to worry about some kind of human factor, but with us it's like our primary data. Um and yeah I I think that's the big one. Um I would say

Interviewer: Yeah and I sometimes have the feeling this might be just my feeling that open science advocates and linguistics have a tendency to preach to the choir. Um so that workshops that we organize tend to be attended by people who are already convinced and are already practicing open science. And I was wondering whether you had the same feeling and if so, what could be done to reach more linguists, the ones that are either not aware or even maybe reluctant to be involved in any open science practices?

Interviewee 18: And to answer question A, yes, I also have the same feeling. But whenever I thought about it, hey, it's kind of weird. People are preaching to the choir. And then I thought about, okay, I know people like PERSON who are very passionate about it. Do I actually know anyone who's like, no, I want my data to be locked away? I want my journal articles to be locked behind a license that very few private people can afford. And that maybe my institute can't afford. And obviously, no, I don't actually know anyone. Those people might exist. I don't know. But that's sort of the flip side of preaching to the choir that sort of at least at the level of where we're at the point, I think, where everyone thinks that open access, open science is a good idea. And so the only real, I don't know, it's not an argument, but sort of. But let's call it a counterargument. It can be annoying and it is extra work. You know what I kind of led with, but it's part of the process. That's how you think about it. It's not optional. It's something that you have to do and that you should want to do. And yeah, so, I don't know, who else you could be preaching to. That's kind of, my answer to that. Um, although I, I haven't really been on like the end where I try to educate people about it. Maybe if you do go to sort of take on that world, maybe, maybe people will say, Hmm, I'd be happy. I'm not sharing this data because I, okay, that's maybe sort of something that I've seen and heard about, that people can be reluctant to share their scripts that they've written, let's say, in R, because they're terrible, almost uniquely, almost without, like, exception. And mine are included, and people might feel ashamed of them. And I think that's, I mean, it's probably not a good move to say, hey, it's OK, because everyone's code is terrible. But at the end of the day, that's kind of the truth. And we are all working on getting better. But that's maybe something where people could be told, it doesn't matter if your code is ugly or if there might be bugs in it or something like that. It's still better to share it than not to share it, because I always feel great when I see someone else's code that they put up on OSF. And my first instinct is to go, ugh, because it means that I'm not uniquely bad. We're all kind of in the same boat. But yeah, like I said, I'm not sure if that's a winning message to say, I know your code sucks, but that's OK. But that's maybe like the one area where I think people might be a little bit unwilling to share because they think, oh, I'm uniquely bad in terms of scripting or whatever when that's not really true.

Interviewer: Yeah, that's really interesting. Yeah, we're reaching the end of my question. My last one would be going back to you and your current work, or maybe even thinking in the past when you were still maybe a PhD student, for instance, what would you need or what would you have needed to do more open science?

Interviewee 18: I mean, at the end of the day, it's always more time, you know. 

Interviewer: Yeah. 

Interviewee 18: Maybe, I don't know, less publishing pressure. So sort of, I recently noticed when I got some old data out of the archives, like in the PROJECT, we are not just supposed to, we are required to archive our data. And I'm the person who does that in our project. And when I got the data, my to-do list got longer and longer and I thought, you know, I do have to write an article right now. I'm not going to do it right now, but, so, so it's, it's, it's fixable, but like, you know, just, okay, this experiment is missing like one of the subjects because they were recorded like half a year after everyone else. And I never archived them. Um, so these sorts of things accumulate over time and to really fix that we require sitting down for a day, maybe a week. Really just doing data archiving. And that's very, very a tough sell. You know, it's kind of like that's sort of data janitor stuff. But I'm the only one, I think, who sort of has the overview. To be able to do it, and now I need to figure out okay yeah I need to find the time for that. And yeah so this is really like the. Obligatory open access part. It's not like this would be nice. It's actually required. For the stuff that would be nice, it's also time. If you're about to publish a journal article and you think, hey, wouldn't it be nice if we didn't have just our stimuli as text so that people can check the items, but also upload all of the actual audio and images that we presented, And at that point, it's always like, yeah, sure, it would be nice, but it would also be nice to publish the thing with, I mean, it's more than the bare minimum, but I do catch myself thinking, okay, am I just doing like the bare minimum that's required or can I also go the extra mile? And usually going the extra mile would add a couple of days or weeks and not weeks, plural, but maybe a week. And yeah, that has sometimes sort of tipped the scales in favor of doing less. So yeah, my answer is more time.

Interviewer: Makes sense. Yeah. Yeah. That was the last official question, but is there anything else you wanted to maybe add on open science practices and linguistics or in the humanities for that matter?

Interviewee 18: I can't think of anything right now. I think just that like maybe to sort of round off some of the edges that I've pointed out. Like I started with, yeah, OK, it can be annoying having to do it. And oh, now I need to fix our archives. That would take a while. I like when it's made a requirement, like, for example, in the PROJECT, because you can't then weasel out of it. I like when people say, hey, we want to incentivize open access publications. If you have any choice at all, do open access. Or that we have funds for it to cover the cost of open access publications, stuff like that. I think it's really good. So that we also have the support to actually increase the amount of open science that we can do. And here in CITY, I think it's, we're in a good position with that, at least in linguistics, but it would surprise me if like in, the humanities at large if it was any different. So yeah. So to end on a positive note I like when it's required and I like when there's support for it.

Interviewer: Makes a lot of sense. Yeah, thank you so much. I'll stop the recording here.

