Interviewer: There we go. Thank you again for agreeing to this interview, for taking the time to do so. You've already sent me the form of consent, but for the record, I want to let you know that this interview is being recorded and will be transcribed by myself and my student research assistants with the help of a local large language model. Is that okay with you?

Interviewee 03: That's fine with me.

Interviewer: Great. And as you know, the aim of this study is to find out about open science practices in linguistics. And so my first question, maybe a bit of an obvious one, do you consider yourself a linguist? And if so, what are the subdisciplines of linguistics that you situate your research in?

Interviewee 03: Yeah, I've come around to considering myself a linguist. I work in corpus linguistics, sociolinguistics, both in a sense of PROJECT. World Englishes, those are probably the big ones. Interested in discourse, I mean, that's, I guess, a different claim. Yeah. Mm-hmm.

Interviewer: That's great. Well, we'll start with your personal beliefs or personal associations with open science. And so my first question is very simply, what do you associate with open science?

Interviewee 03: Well, there's several several aspects to this, right? Open accessibility of results, right? Sort of open access, that whole topic, which I'm sure we'll talk about in more depth as the interview continues. Availability of procedures, data, anything basically that ensures reproducibility of study designs, replicability ideally, and we've talked about this outside of this interview already quite a bit. So basically it's at an ideal plane, it's making available everything that goes into the research that enables basically anybody in the world to trace the steps of that research, arrive at the same results, and also critically evaluate each and every individual step.

Interviewer: Yeah, right. And linguistics is traditionally thought of as a humanities. And for some people, open science is therefore not the right term. Some people prefer to use open research for humanities. Others go for open scholarship, which is often understood to be broader, so to encompass open science slash research and open education. And I was wondering, have you thought about this or what are your thoughts on this terminology? What do you think is best suited for open linguistics?

Interviewee 03: All right, I'll start right away saying that I haven't given the terminological intricacies all that much thought yet. So I'll have to think kind of on the fly here a little bit. I'm thinking probably this, I'm not sure I'd want to commit to which is the best suited term. Because I guess some of it centers more on the research procedure and outcomes, and I guess open education encompasses a lot more things. It encompasses probably a much more socially, the way I read it anyway, just being given the term, kind of a social commitment to not just making things available, but ensuring you know, that it actually becomes available even to people who might be in structurally disadvantaged positions to make sure, you know, to actively engage in the dissemination of, you know, everything associated with your scholarship in a way that balances global inequalities. Maybe not just global but any inequality really. I don't know if that kind of meets the definition but in my mind it does.

Interviewer: I think lots of people have different definitions and so some people will see will treat open science as the umbrella term and so that for some people it also already encompasses open education and others see that very differently. So yeah it's I think it's just a terminological conundrum at this stage.

Interviewee 03: Yeah and I mean there's really two big different strands here, right? One is essentially associated with sort of philosophy of science, empirical rigor, and the other is associated with social activism to some extent. Those aren't the same thing, right? So it's probably, it would be a good idea to have terminology that keeps these things, that doesn't muddy these things, I guess.

Interviewer: But you don't see an issue with using the term open science for a humanity.

Interviewee 03: I mean, the question really is, is linguistics a humanity, right? That's what I love about linguistics. It has all these many hats that you can wear. And I love the humanistic hat. And I love the more scientific hat. I personally would see myself more in the humanities, actually, than the sciences. But I honestly never see a problem with the term open science. I guess, you know I mean, you know if we think beyond linguistics, yes, it would make sense to try to broaden the scope of any of these open umbrella terms in a way that makes it less exclusive to the hard sciences. Exclusive exclusively designating the paradigms associated with the hard sciences.

Interviewer: Yeah, it's interesting because so far I've mostly spoken to German speakers, or linguistics who, among other languages, speak German. And I'm wondering whether this issue is a very English-based one, because in Sprachwissenschaft we have Wissenschaft.

Interviewee 03: Yeah, we have Wissenschaft, which is much more all-encompassing, yeah.

Interviewer: So, yeah, it's an interesting idea that I think we might have a slightly different perspective.

Interviewee 03: But I think even, even in the Anglo tradition, right? I mean, it's not at all clear whether linguistics is firmly a humanities discipline or a science discipline. Like if you think of, you know, all of Chomsky's work developing out of MIT, typically the linguistics departments in America and the UK, as far as I'm aware, tend to be sort of the fine arts humanities faculties. But it's kind of, I mean, it's, you know, we have one foot in both, I think.

Interviewer: Yeah, it makes it very interesting, for sure. I'd like to speak about your own experiences of open science practices. Which open science practices do you practice? Which ones and why, maybe?

Interviewee 03: Okay, let's talk about sort of the replicability aspect. I make data and code available usually via the Open Science Foundation, sometimes with collaborative research we have GitHub repositories as well, that sort of stuff. So that's my experience in the realm of replicability. It's something that I feel like it would be great if there were much more of an established expectation to do this, because I do feel sometimes you want to do this, but you recognize it's right, it's good practice to do it, but at the same time, you're sort of navigating a field where, you know, published research is cultural capital, gets you the jobs, gets you kind of eligibility for funding, et cetera, et cetera. And it's sometimes frustrating to see how much, you know, how much passes through without providing all of this, which just providing it is extra work, right? It makes the work better, but still, it would be really desirable, I think, if journals, publication venues took a much firmer stand on, hey, if you want to publish with us, we require this. Which I guess brings me to the open publication, open access sort of side of things, which I have a lot to say about, I think. And I've only once paid an article processing fee to have something published open access, and that was in Frontiers of Artificial Intelligence, I think. And that was because I had funding from, I had third party funding, and I felt very dirty for it. I really disliked this practice. If we think about sort of the publishing industry and sort of the value generated in the system right were all working for publicly funded institutions where we generate research results basically from taxpayer money and then we give up the copyright to all of this essentially for free to to publishers and then either we have to buy it back and again we've already given them the copyright for free and then we buy actually out of our own pocket or out of what ends up whichever way you put it being taxpayer money . You buy that copyright back so people can access it. And that's fundamentally fucked up in my view. And I think it's a system that won't sustain itself forever. I'm kind of surprised at how tenacious it manages to be. But if you look at the more science-y sciences, my PERSON is in physics. They've shifted away from that much more radically where you have the arXiv server. And that's where all the new research happens and is hosted. And you still have cultural capital associated with publications in nature and science and all those glossy, very predatory journals. But but let's like cumulatively. That's not where publishing in these fields is at basically. Sort of. That took me off on a tangent a little bit. So I'll give you a chance to redirect the conversation to where we want.

Interviewer: It's a perfectly fine tangent and it's covering some of the questions to come. So it's no problem at all. But I think I will try to stick to my thread so we can come back to these ideas. Just because I risk forgetting otherwise. Because you already practice quite a few open science practices, you know, sharing your data and your code and trying to publish open access in what sounds like more diamond or or green paths.

Interviewee 03: Um yeah I have to say that's kind of embarrassing but the sort of gold diamond green terminology I've never really understood. I don't have a firm grasp on exactly what the differences there are. So if you wanna talk about these, you probably need to enlighten everybody.

Interviewer: I'm no expert, but I mean, green is the idea that the paper was published in a proprietary journal, but you're allowed to publish your pre-print or your author accepted manuscript, but with your own formatting. And many journals let you do that. Publishers as well in general for books as well but it's not the nice version and that's one way to do open access. And then the diamond. So gold is what you were talking about like frontiers where you have to pay

Interviewee 03: You buy out basically

Interviewer: Yes exactly, and diamond are journals that are typically run by universities or societies where it's free for everyone throughout the process.

Interviewee 03: Yeah. And I think we need more of the diamond kind of stuff. And it doesn't necessarily stop there. Right. I don't know how much you want to talk about the review process.

Interviewer: Yeah. Let's talk about that.

Interviewee 03: I mean, sort of the idea that all of this goes to, you know, usually between two and three reviewers and the reviews themselves never get published. And then sort of there's a decision that's moderately transparent to the authors usually, but certainly not the readership about sort of how this came to be published. One can think about whether there aren't alternatives to that, right? I mean, we have all of Wikipedia that works on a different model where we have, you know, versions of articles that then anybody basically can work on to comment on, et cetera. And we can see if you're interested in that. It doesn't hit you in the eye, but you can see who commented what, et cetera, follow the history of these documents. I don't think that like a one-to-one replication of the Wikipedia model is necessarily feasible for academia, academic publishing, but something like a model where you upload a draft, like your initial submission, and then there's a set period during which basically this can garner comments, and then a set period for revision, and all of the steps involved are kind of becoming a history of the document that anybody interested might be able to retrace. I think that's something I'd love to see. Somebody think about that in earnest and put that into practice, and maybe you're You already know, publishers that do that. I'm not aware.

Interviewer: Yeah, there are journals that try and implement that. And they call it. I mean, it's an open science practice, right? Open peer review. And the ones that I've come across, they are not anonymous so it's not blind. And I wonder how you feel about that?

Interviewee 03: About anonymous peer review?

Interviewer: And as in, because when you're thinking of the Wikipedia model, you can have a pseudonym, right? You don't have to have your own real identity when you comment, or even write bits of articles. I wonder what your thoughts are. Do we need to keep the blind review system? Or should it be transparent, such that we're also signing our reviews?

Interviewee 03: That's really interesting to think about. I think the anonymity makes a certain degree of sense in the mode that is currently standard, where it's basically only the authors of a submission that get to read the reviews. So that's sort of, I mean, I guess the idea is to prevent any sort of personal animosities arising, et cetera, et cetera, and to also give the reviewers more leeway to give their opinion without having to sort of be face-saving and tread on eggshells. I think a lot of these problems might disappear if we went to an open review process. It's not too dissimilar to what happens at conferences, right? You present your talk, and then somebody gets up and says hey, have you considered this? In a good, in an academia that I imagine going in the right direction, you'd actually, as a reviewer, as somebody who comments on this sort of stuff, you would increasingly see that become a source of cultural capital, right? The fact that you're doing this, that you're not anonymous, shows you're invested in the discipline, in the dialogues and the conversations that are going on. So I think, in general, I would favor getting rid of anonymity, but in conjunction with this much more general move towards towards open peer review.

Interviewer: Yeah, it's really interesting because at the moment, the good peer reviews are not valued as such.

Interviewee 03: Exactly, yeah.

Interviewer: They're not really seen as valuable contribution.

Interviewee 03: And I'm sure you know this, usually when you do peer review, you do get to see what somebody else wrote, the other peer reviewer or reviewers at some point. And I'm sure you're aware that there's a huge gamut from two lines to five pages of really constructive feedback. And none of this becomes either sort of gate-kept, where you say, well, sorry, this is not enough. We can't consider this. This review, because it's simply, you're not, you know, you're not really arguing your points, et cetera. And I've had, this hasn't happened to me, but it happened to close colleagues that they received sort of, and usually, I mean, the stuff that they talk about is, you know, negative reviews that are like a couple of sentences and are just blanket criticism this study is horrible, et cetera, et cetera. Where I think, the editor, it should be the editor's responsibility to sort of jump in and say, hey, this is inacceptable, unacceptable. But then obviously the editors are also doing this for free to generate value for the journals. And they can only be asked to take on so much in their role. So it's all, you know, the whole system kind of is in for an overhaul.

Interviewer: Hmm. Yeah. Interesting. Yeah. Well, um, for now, try to move away from your personal experiences and think about, um, open science practices and linguistics more broadly, but you're welcome to concentrate on the disciplines that you're, or subdisciplines of linguistics you're most familiar with. Um, which practices do you think are most linguists aware of? Which ones are commonly practiced and which ones are largely unknown still?

Interviewee 03: Um, I think there's a huge generational divide or generational sort of cline here. Right? I think in general, you know, people who were socialized in their formative years, like 20 years plus ago, I think for them, most of these people, and I'm sure there's individuals who are, you know, committed to, to, to, staying in touch with the process but to most people that I'm aware of, these things don't have a lot of they're not It's not really something I think that is thought about a lot. If I look at the research that's published, it's tough to put a figure on it, but it's certainly the minority of published quantitative research that you read that says, gives you adequate information to even try to retrace the steps. I'm currently supervising a term paper where the student is trying to sort of replicate and tweak, elaborate on an initial paper's methodology and just the way the data is described. There's no way of kind of replicating what's being done there. That's the experience I think mostly so, so, um, and I'm not sure that that means linguists aren't generally aware, um, of the fact that there is the idea of open, open science. Um, but there's certainly not, there's not this culture that this, this is the standard, or if there's the culture, like you, you, you know, You develop that for yourself within your academic environment, but it's not like the linguistic community as a whole is beholden to that necessarily. And then obviously there's, I mean, there's, we also need to talk about research outside of these sort of quantitative experimental corpus paradigm. If you talk about ethnography, if we talk about, if you know, research involving human subjects, you know, ethics and privacy and data protection concerns come in and can be really tricky to navigate. But I think a lot of this type of research, I think the idea of replicability isn't at the center of that anyway, because it is more sort of hermeneutic, humanistic, where replicability of the exact phenomenon, the exact social constellation, et cetera, isn't really the goal to shoot for. So you have to think about these things kind of from a completely different perspective when you do ethnography, when you do survey focus group interviews, et cetera. And this, I mean, there's obviously overlap, but I think the, the culture of thinking about research ethics when involving human participants is further developed than the culture of, you know, doing quantitative analysis and making all of the methodology replicable.

Interviewer: Yeah. You've already touched upon some of the factors that you think might contribute to this fairly low uptake of, I've seen this practice in linguistics from your description, but what do you think could be done to increase that uptake?

Interviewee 03: I mean, education is a huge part of it, right? And we have, we do see, so I guess I started with a generational cline and then sort of went more in the negative direction. We do see the younger generation, which I hope I can still claim myself as a member of, sort of developing much more in that direction. You see, you know, workshops at major conferences now about, you know replication crisis, open science practices et cetera it's definitely something that's through education that's developing. But I think the number one thing that would be beneficial to really help the process forward is to establish standards and to hold people accountable to these. Um so that that you know those who aren't already doing it are kind of nudged forcefully nudged to do it and those that who are doing it are not sort of indirectly penalized, essentially.

Interviewer: Where do you see those standards? Where could they be implemented? At which level?

Interviewee 03: I mean, the crucial level, I guess, is if we're talking about the current publication landscape is editorial decisions, right? That basically, and sometimes, I mean, it's typically these days, I think it's the reviewers that make these decisions that basically tell the editors of a journal, I think, this, you know, I can't evaluate this, I need access to the data and really everybody should be given access to the data, but it really shouldn't be, we shouldn't have to rely on reviewers because you never know what individual reviewers or what requirements they're gonna put to this. This should be a policy level thing in terms of, you know, the publishing policies of journals, publishing houses. Yeah.

Interviewer: And you mentioned that, you know, there are groups of people, bubbles, I think is the term you use, the people who use or practice open science and belong to that group. But how did you learn about these practices? Like, what encouraged you to get started?

Interviewee 03: To be honest, in this sort of similarly to the process I just described, submitting things for publication and basically being asked by a reviewer to create an OSF account and make those data available. And that's really, I mean, I've never enjoyed the benefit of any sort of instruction in that direction. It's been, that's just never been part of any teaching or mentorship. I've really received it just basically been being told that this is a good idea. And I mean, it doesn't take a lot of common sense to then say, yes, this is a good idea. And I will say, I mean, I'm certainly not a shining beacon of the best ways of doing replicability. I suck at commenting, making my code interpretable, et cetera. But this is the sort of stuff, right? I would myself also improve if there was a standard that also everybody would be held accountable, were held accountable to. I think that strayed a little bit from your initial question, but that's my personal thought

Interviewer: No, that certainly goes in absolutely the right direction. And in fact, the next question you've kind of already answered, because I was going to say, what would you personally need to do mre open science or even open education for that matter? And it sounds like standards and having policy implemented would be one factor.

Interviewee 03: Yeah, yeah, yeah. I think, I mean, I think all the technology is there, right? There's plenty of code sharing and data sharing services that, you know, that are entirely feasible. With, you know, with corpus linguistics, copyright stuff is something that sort of. Probably we need more clear legislation on that sort of stuff, or maybe not legislation, but jurisdiction. I don't know at what level this needs to be decided, but my experience with corpus linguistics is that a lot, that the whole idea of corpora right is that you design this body of text that people can use it and work with it and then the the free sharing of the corpora is hampered by compiler's concerns about copyright so very often it's not possible to download the corpora directory somewhere. You have to sort of kind of be in the right network to get access and that's obviously it's not ideal, you know, by any standard, but it also kind of, and this gets into the sociological side of things as well, like it makes sure that there's a cline between data haves and data have nots access to the data isn't, isn't on equal footing. And that's, that's something we need to actively work against at, you know, in terms of legal certainty, I guess, legal assurance, but also potentially when you get funding to compile the corpus, there needs to be more of an onus on the person or team taking that funding to ensure the long-term availability of those data. And the corpora I've worked with, there's very few examples where I would say, hey, this is really, that's the way to do it. It's either a lack of foresight, the project is compiling the corpus and then the corpus is there, but there's no funding and no sort of structural plan that goes into making the corpus available and dealing with copyright and dissemination. And there's also obviously the other side of trying to make a profit off the corpora. If you look at the Mark Davies corpora, which are by the way, aside from the sort of business model they run on are horrible for replicability in a number of ways. If you interact with, this is one, I mean, this is one experience I've had where I've reviewed a submission for a paper, which I thought was far from perfect, but more moderately interesting. I could have seen that kind of developing into a publication, but it reported figures from, I think it was COCA explicitly claiming it was from the website, which I wasn't able to replicate. There wasn't any change, any documented change to the corpus, but the website isn't stable. The figures it provides aren't stable diachronically across time or across the different search interfaces. So you can look for the same structure in the list and in the chart search modality and you get different numbers. Sometimes one order of magnitude different, right? I mean, it's like, it's drastic. And I mean, the difference of one would be far from optimal, but it's sometimes really drastic differences. And what, you know, what do you need to do with that? You can't do science with that sort of situation.

Interviewer: Wow. Yeah. I had noticed some issues, not quite that major, but yeah. And, um, with a, with a colleague, uh, asking for help with their queries and I thought, well, I'll get your numbers with exactly the same results supposedly, but it was actually different on a different day.

Interviewee 03: Yeah. That's, that's, I mean, that's kind of really, really worrysome. It's really insane. I talked about this as the Googlification of corpus linguistics in my corpus linguistics lecture, because essentially that's what Mark Davies is going for, right? Googlification.

Interviewer: I'm not sure I understand what you mean by that.

Interviewee 03: Google, right? Googlify. Google search engine, right? So they want you to kind of hit some search and you get very quickly an amazing looking result, but it's not stable. It's sort of, the problem is obviously it suggests all of that science-y stability and all of that. It's just, um, just not there. It's also, I don't understand. I don't understand why it isn't there because it would seem the easiest thing to do. Implementing something like that would be a system that produces the same results all the time. So it's, it's completely intransparent. Are you aware of that PROJECT about corpus linguistics. That's happening in I think on PROJECT?

Interviewer: In CITY, yeah. But unfortunately, I won't be able to go.

Interviewee 03: I'll definitely go to that. And PERSON will be there speaking as well.

Interviewer: Oh, really? Interesting.

Interviewee 03: Could be an interesting discussion there.

Interviewer: Yeah, really. I mean, I need to write to them because I submitted an abstract. But I have to retract it because it's going to be our PROJECT. So I can't really not attend. Yeah, I just have one more question, which is, I mean, this is a very personal one. I sometimes have the feeling that open science advocates in linguistics, at least, but in general, I guess, are often preaching to the choir and sort of speaking among themselves. And I'm wondering whether you have that impression, too. And if so, whether you can think of any ways to reach out to more people.

Interviewee 03: That's a really good question. I would say yes. And this is maybe a bit of a bit of a lame response, because I'm just going to reiterate something I said before. As long as all of this is optional, it's kind of a nice to have for the people who are already invested in it. If you decide whether you're gonna attend that workshop on open publishing or not, then it's not going to be as easy to address people who are not already committed. I mean, in the review process, et cetera, you're gonna be able to do that, but it's really, I think what's needed is really a shift towards more accountability, towards kind of a low bar threshold of basically saying this needs to be there or this manuscript can't be published. That would be, I think, a great starting point. I mean, some of it is literacy as well, right? I mean, I've talked about all of the technology being there, but the technology being there doesn't necessarily mean everybody can navigate it confidently and competently. So giving people the literacy both to implement things at a technical level And also the teaching, just raising awareness, right? Coupled with a shift towards more accountability, I think those are gonna be the processes that help break this out of the small bubbles, the bubbles that currently is practiced in. It was a thought I had that kind of diappeared now. Maybe I'll come back in a second.

Interviewer: If it comes back, great. I mean, otherwise, that was my last question. Unless there's something else that you wanted to add on open science in linguistics or, in fact, more broadly in the humanities. If you had any more thoughts.

Interviewee 03: Do I have any more thoughts? Publishing we talked about. Nothing that immediately comes to mind.

Interviewee 03: Or an afterthought, and that has to do with the sort of the position you're in, right? And this goes for the publishing stuff, for the good practice and open science stuff. The onus to kind of to push this kind of is with the people in tenure positions, right? Who don't have to worry every day about their publication record and all of that. And the problem is you can't, it's hard to get anybody at that level to do it other than because, you know, they don't face any pressure necessarily other than convincing them that it's the right thing to do. And, I mean, this also goes for publishing, right? I mean, you can't expect somebody who's writing up their PhD to go for a potentially lower ranked journal. So I have an example. I think the journal is just called Constructions. It's a fully diamond open access journal. And a colleague of mine and me, we're intending to compile an edited special issue in there along on you know a topic we've been working on together. Um and some people have written back to us saying basically this is this is great it's all good but we realize this journal isn't Scopus-indexed, it doesn't have such and such criteria. Um and so it's going to be really for for me. I work at a COUNTRY university and There are metrics that my tenure committee looks at, and this journal is not gonna meet any of these metrics, right? So I can't expect people in non-tenure positions to not care about that. But it's all the more important for them, for tenure people like myself to basically say, you know, I don't have to play along with all these publishing houses' rules. I don't know if you know Jan Blommert, or who Jan Blommert is. Not exactly your line of research, I guess. But he has a great blog post, which I can send you, on sort of predatory academic publishing. He died a few years ago due to cancer. But he essentially basically ended up founding the Tilburg Working Papers in Linguistics. And that's the only place basically he published. Because he's a really within the field of critical discourse studies, et cetera, language and globalization. He was a really big name. That publication drew big names. If you have a paper in that publication, at least within the community of researchers, that counts for something. And it completely undercuts all of the publishing houses' bullshit, basically. And the more initiatives of that kind we'll see, I think the more the more we'll be moving in the right direction. So yeah I thought maybe you wanted to record that.

Interviewer: Definitely thanks.

