Interviewee 05: You'll have to tell me more about the project afterwards, perhaps, or during your interview, however you like.

Interviewer: We'll see. I'm happy to obviously answer any questions you have. But first of all, yeah, thank you very much for agreeing to the interview. And you've just sent me the form of consent. But sort of for the record, I want to mention again that the interview is being recorded and that it will be transcribed by myself and my student research assistants with the help of a local large language model. And is that OK with you?

Interviewee 05: Yeah, I hope it will be worth all the trouble.

Interviewer: I'm sure it will be. So yeah, the aim of this little side project is to find out about open science practices in linguistics and trying to cover a range of subdisciplines and people of various degrees of experience and knowledge about open science practices. So my first question is probably an obvious one that's would you say that you are a linguist? And if so, in which subdisciplines of linguistics do you see your research?

Interviewee 05: Yes, I would say I am a linguist. First of all, if that qualifies me to be part of this. Sub-disciplines. Well, originally I started doing corpus linguistics. So variationist corpus linguistics, focusing on different originally more of the PROJECT factors that influence language variation. But then very soon I also entered into external things like variety differences, for instance. PROJECT differences getting more important. Yeah and now recently I've started focusing a bit more on the PROJECT of corpus linguistics as well. So I'm not I'm not paid for research so I'm a, I'm employed as a TITLE so I'm supposed to do teaching. And I noticed that if you teach the more practical aspects of corpus linguistics, it's more convenient, more interesting for the students as well. So that's how I got involved in more applied things related to corpus linguistics as well. Yeah.

Interviewer: Wow. I did not know that. For someone who's not paid to do research, you're very productive.

Interviewee 05: Well, I don't know. It's the focus, yeah, well, there are two different foci, I say, the kind of research that I used to do. Is difficult to accommodate with everyday work, but the research that has to do with applications of corpus linguistics and training students to become corpus literate, etc. That is quite easy to do if you have lots of students.

Interviewer: That's really interesting how that affects your research practice.

Interviewee 05: And then I'm doing things like teacher trainings as well. So yesterday in the afternoon, I had a teacher training session again. Because of that connection as well. So we have a PROJECT at our university. I think as usual in the languages, you often have many future teachers among your students and we have a very active teacher education center at INSTITUTION and they encourage people who are basically engaged in their unapplied sciences so to speak to to think more to take into consideration what future teachers need etc. So that's how it started.

Interviewer: Wow, that's really cool. Well yeah we'll start with your personal beliefs or associations. So what do you associate personally with open science? What are open science practices to you?

Interviewee 05: I have to say, when you tease at that topic, I thought I was thinking more of, well, the typical kinds of corpus linguistics that I've been doing for many years. So collecting data from corpora, making your data sets available, ideally together with the coding that you apply to them. Sharing them with reviewers and with readers. Yeah. So that that would be my first kinds of associations. I'm not very much into programming, so I'm not sharing any R code or whatever. And then with this research into PROJECT, I'm using questionnaires, online questionnaires, but I feel that that's not the kind of data that I would be justified to share with, that I would want to share online. So they are specifically collected by myself from people who know that I would use them anonymously, but not publish the materials. So yeah, it depends a bit on what kind of data you're talking about. But I think, yeah.

Interviewer: Yeah, certainly sharing data and sharing code, those are practices you're aware of, whether or not you can or want to practice them all the time. That's really interesting. I said open science and that's how I kind of opened up the conversation, but linguistics is traditionally considered a humanity and for some people open science is therefore not a suitable term. So some people in the humanities prefer to use open research instead of open science. And there are others who prefer the use of the term open scholarship, which is often understood to be like broader and to encompass also open education as well as open science slash research. And that's a lot of terms, I'm going to put them in a chat so you can see them. But my question would be, yeah, what do you think of this? Do you feel that open science is suitable for linguistics? Or do you think there is a more better suited term perhaps for our discipline?

Interviewee 05: I don't know. I've never really worried much about the terminology. The data sets that I have published so far appeared on the OSF, Open Science Framework. So there you have open science. And I think the other terms would certainly fit as well. Open education reminds me more of open educational resources, which we've been producing, as you know, as part of this PROJECT, but education seems like quite not the same thing. Yeah, so that's more for those being educated, right? You're sharing educational resources with them. And yeah, research and scholarship would certainly also cover open science. What can I say about that? I've never really worried much about the terminology. I would say that linguistics, at least the way I've been doing it, is actually quite close to sciences. I know many people don't consider it as such, but it's always very much to do with data. I do phonetics classes, phonology classes as well. And they are obviously on the borderline between humanities and the sciences. And so, I mean, we have connections with the social sciences as well. They are also called sciences, right? So sociolinguistics and anything that has to do with the social context in which language is used. So I don't have a problem with any of those, though I think open education, I would connect it with open educational resources. So things that you share to train people, to give people teaching materials, learning materials, et cetera. So that seems to be something different to me.

Interviewer: Yeah, that's really interesting. I mean, the terms are used differently by different people and it seems, you know, nobody quite agrees. But I'll probably continue just speaking of open science, but I mean everything. So I understand open science in a very broad sense. And I'm quite happy to include open education in the sense that when people are sharing their scripts, then they can also be used for educational purposes. You know, people can learn from them. So feel free to use whatever term you're comfortable with in the rest of the interview, but I always mean it in a broad sense. And you've already started talking a little bit about your own experiences of sharing data and also when you're not sharing data for good reasons. Are there any other open science practices that you're familiar with that you've actually been involved in?

Interviewee 05: Usually, maybe since I’m not since I don't have that much time to spare for research, I usually don't visit other researchers' data sets because I’m not the person doing meta-analyses or I don't know revisiting other people's data and just because I’m working in the same area there's just not enough time and then I also have these other obligations, more applied things. But what I've been doing, I've edited a book and I'm currently editing another one where we ask contributors to make the data available. So we did that on OSF or asked them to if they have their own OSF account to post the data there and we would just link them across. So that's something that I've been doing. And then in the process of that we were obviously reviewing their papers contributions to those collective volumes and there was one that I found really interesting and the author had shared the data the PROJECT and then I looked into that because it really touched upon my area of expertise to do with cognitive processing factors, et cetera. And that gave me a chance to really see the coding that she had done. And that was quite instructive because it seemed that she had, to my mind, miscoded a few things. So we entered into a discussion on that. I gave her some feedback and she was very grateful and changed her data and re-did the analysis eventually. So that was one case where I thought it's really useful to get a glimpse of other people's data. Yeah. So most of the time, I don't take the time to really look into the data, even if they are published somewhere, because of, I don't know, I'm not deeply involved in anybody else's exact studies. So I have no occasion to revisit their data in any sense. But in that case it was definitely useful. So I think it would be great if people could do that as a rule whenever possible.

Interviewer: Yeah and it's interesting at that stage at the review stage where things can still be corrected. And yeah valuable discussion and exchange of ideas. That's really interesting.

Interviewee 05: So it would be great if that happened more often. Yeah. But I think it's a time thing as well. So I don't know how many reviewers and then especially readers who usually don't engage so much with a paper as editors do. So if they really find the time and depending on how the data set is prepared, it may also be very difficult to understand what the author thought they were doing. Yeah, maybe not always easy to follow the author's thoughts.

Interviewer: And I was wondering, because I think it's really great that, as an editor, you are encouraging the contributors to your edited volumes to publish whatever they have. Is it an encouragement or is it like a criteria? You must do this to submit a paper or a chapter in our book. How do you go about it? And have you had sort of a positive or negative feedback?

Interviewee 05: Yeah. Well, several, several types of reactions, I would say. In one case, the book was really about corpus data. So the data themselves obviously played a very important role. So we kind of really insisted that everyone make something available as far as possible. And they all obliged. So we got accompanying materials for every single chapter of various qualities, but yeah, as far as possible. And then in the other case, it's more like it's in a way we are actively asking people to contribute to something that is not as high flying in terms of methodological expertise etc. So in that case we just offered them to accommodate their data on OSF for them or if they have them available to make them available themselves. So it depends a bit on the aim of the publication. So for a methodological volume I think it's a great place to have access to the data as well. But if the purpose is a bit different I have so far not made it into a criterion. Yeah. But that depends on the purpose of the publication, I think.

Interviewer: Yeah, yeah. But I mean, it's unusual.

Interviewee 05: It is unusual, sorry.

Interviewer: Yeah, I mean, it's not common practice, is it? 

Interviewee 05: Yeah, well, I edited a similar book in 2009 where that wasn't so much talked of yet. So there was no thought behind, no one thought of making data public, right? They were also corpus data, mostly. And it would certainly have been possible, but it wasn't so much en vogue then at that time. So it depends, it changes with the years. And then if you want me to talk about the different reactions that I received, there were, I mean, depending on the kind of book, right? One is something dedicated to another person right? So if the contributors themselves grew up in a at a time where open data was not so much a question I would not ask them I would not expect them to do it and sometimes it's, it may be technically difficult for them even, which doesn't mean that their research is bad. So I'm not saying that definitely. There are some excellent researchers who just don't have the technical means of doing that. 

Interviewer: Yeah. Yeah. Which actually brings me to the next question, which is where and how did you find out about these practices or like what encouraged you to get started in sharing things and engaging?

Interviewee 05: In my case, I would say personal contacts with people at work or co-editors who are themselves following such practices. And I mean if you co-edit a book on research methods on corpus methods yeah I mean it wasn't the publisher or anything but it was the co-editor and also the contributors. I mean they didn't have to tell me at that point. But basically based on contacts, people who I know are doing it and who have very good, they have a very good point promoting open science. Yeah.

Interviewer: So, yeah, role models, but also like it matches your beliefs, your values.

Interviewee 05: Yeah.

Interviewer: I think many people started that way and only few people who attended a course or something. But as you rightly pointed out, there are it's not, there are some technical aspects. Not everyone is technically capable or has the knowledge to do it.

Interviewee 05: Yeah, so I never attended a course or anything. I saw just today that Open Science Framework, they have workshops like every month or even twice a month or something. I thought maybe I should follow one to finally know what I'm doing there because I'm doing it in a very non-expert way. But it's not difficult to handle either. So it is actually quite easy. So I never really followed any courses, tutorials, anything.

Interviewer: Yeah, interesting. Yeah, we'll try and move or look at the bigger picture. So beyond your personal experiences and now thinking of linguistics as a broad field and within that your area of corpus linguistics and applied linguistics or PROJECT, I don't know which one you prefer, but your, you know, your expert areas. How widespread would you say are open science practices in linguistics and which practices are most linguists aware of, which ones are commonly practiced and which ones maybe not?

Interviewee 05: That's a difficult question with many facets to it, so I don't really know where to start. In corpus linguistics, I think it's, I mean, like the research branch of corpus linguistics, I think it is relatively common because it's also easy to do. Corpora, I mean, if you're dealing with reference corpora or even if you're compiling your own corpus, the corpora are often publicly available or will be made available. So just publishing concordances based on those corpora is not a big step. It's relatively easy, straightforward concerning data protection. There are no issues. So I think it's rather common in that area. When it comes to applied corpus linguistics I don't think it is common to publish anything because it's more about how to use corpora right how what kinds of well applications you can put them to. I didn't really honestly I don't really know of any open data practices in that area I’m not sure. And then, can you remind me what were the other aspects of your question?

Interviewer: Sure, I mean, which practices are linguists aware of or corpus linguists aware of and which ones are practiced and not so practiced? So we've talked about open data. Of course, it's also open materials, so like sharing items or questionnaires, open code, open access, and then things like pre-registration or open peer review. I mean, there are lots of different practices we haven't yet touched upon.

Interviewee 05: That's right. But to be honest, I find it really hard to say who knows how much about which of these practices.

Interviewer: So hard to say. Yeah, it's true.

Interviewee 05: What can I say? Sharing code. I mean, of course, if you're in the right community of R code developers, R packages, et cetera, then it's certainly very, very common. Sharing questionnaires, I mean I don't know if I’m sharing open educational resources is something that is relevant here and there is certainly quite a lot going on in that area. I think right? Your own teaching materials for instance. I found them very inspiring. Nice to see what you've been doing. So I’m actually if that is relevant next semester I’m planning to do a seminar with students that explicitly focuses on how to construct teaching materials based on corpora and to compare them with materials that ChatGPT etc. can produce for you and to do a reality check in a way based on corpora again to see if the complexities of the English language and I'm thinking of vocabulary issues, collocations, grammar structures etc. If they are well represented in the teaching materials that others have produced based on corpora perhaps and that my students hopefully will produce themselves based on corpora and also especially to check the quality of materials that are generated by AI models. So I think that might be interesting. And the idea is if there is anything good that comes out of that to make it available to go together with our PROJECT with this PROJECT training package as some kind of perspective or showcasing materials that you can produce if you are proficient in using a corpus yourself. That's something I'm planning to do next semester. I hope it will work, but I know it's a lot of work. So that's why I admire your teaching materials. And you also said that you had to put a lot of editing time into it. So it's incredibly hard for students to produce teaching materials with the help of corpora. I've tried that in several cases, but not so much with the idea of making them available to as an open educational resource. That's the idea now that our PROJECT is finished. So as an add-on to that, so far, I haven't published that. Only two small bits and pieces that I use in teacher trainings, for instance. And then what I've been doing. Due to this applied focus, I've, during Corona, right? I taught a seminar on PROJECT. And since everything was taking place online through Teams, I thought it might be a good idea to get my students to produce a series of podcasts on how to use corpora for various teaching purposes, like to check up on the correctness of texts, to produce your own teaching materials, also to get students or younger pupils to get their hands dirty and use corpora themselves and what there is worth knowing about the relevance of prescriptivism and whether or not it has an effect on usage in English, etc. That was a big effort as well. So the outcome was basically seven or eight, 10 minute podcasts, video podcasts. Um, but again, as, as you were also saying, I had to put a lot of effort into that. Myself and the students were constantly complaining that it was difficult to fulfill my expectations and that they had to spend a lot of time revising and yeah. Polishing their videos and yeah so that's a lot of extra effort for everyone involved. I could have just asked them to write term papers and then hand them in and I'm the only one to ever read them. But I thought it was more worthwhile and it is. I'm still using the videos.

Interviewer: Oh cool wow yeah, but yeah there's a lot of extra work that's for sure. Yeah so my next question would be thinking now about linguistics more broadly or corpus linguistics if you like what could be done to increase the uptake of open science and open education practices?

Interviewee 05: So it's two different questions right? So open science among scientists. If we call linguists scientists various things could be done, right? So editors journal editors also could encourage it more or make publication conditional on the availability of data. But Then again, as I said, I know some people who do excellent work, but he would not be able even and it would be a huge extra effort for them to make them available. Maybe they would need help by editors then in that case. I'm thinking of the older generations of linguists, right? So that could be done, of course. I don't think that the repositories need to be made easier to handle because they are already. I mean if you have things like TROLLing I have never really dealt with that but that seems to be not so easy for data providers. I don't know because you there are several stages of quality control and so on right? But if you. If you just want to share your data it's ever so easy. Yeah it could be encouraged more by anyone involved in the publication pipeline down the publication pipeline and for open educational resources. One thing that is a bit problematic is that there are many different platforms with different licensing conditions. For our materials, we use Moodle. Moodle has lots of wonderful functions, but depending on the instances of Moodle that you use, there are several functions that you're not allowed to use. Tracking the activity of users and it's not always possible. Even within Moodle, they are not all compatible. And then If you want to share materials with teachers, in COUNTRY, they have a rather, I mean, I think they have updated, they have a new system now for sharing materials. But until about a year ago, I think, they had something that teachers and students were always complaining about. And to make it worse for us at university, we never got access to it. So if you worked at a university in COUNTRY, you were not able, technically, you didn't get access to this database for teaching materials. So you would have to hijack a teacher to upload your materials or something like that.

Interviewer: Then you weren't credited as the person having uploaded it?

Interviewee 05: I don't know. We never really did it that way.

Interviewer: That would be, you know, it'd a shame, really.

Interviewee 05: Well, you could still somewhere place the message of who actually produced the materials.

Interviewer: Yeah, true, but strange.

Interviewee 05: So many different platforms. So there is no really central platform where. Yeah. There are many, many who try to do this job. But there is not a single one where you know you'll find everything there is. Yeah.

Interviewer: Yeah, that is an issue, isn't it? Yeah. Do you think there are any specificities of linguistics that should be taken into consideration when we try and apply open science principles to linguistics research, as opposed to, say, other disciplines that maybe already are implementing open science practices more than in linguistics?

Interviewee 05: I don't know. So as I said, in answer to the previous question, it could be encouraged more. Maybe as far as I know, I mean, there are no real fixed standards or anything that have been set up that everyone's supposed to follow but I don't know that much about how it's done in other disciplines. Um, yeah hard to say. I don't know. So with corpus data I think it should be easier again but then not all copra are publicly available. So maybe you cannot easily publish materials on copra that are kind of proprietary that you can't simply copy and paste from in publications. The platform problem as well. So again, I don't know if other disciplines maybe have one centralized platform where data is collected. Yeah, so that might be useful for linguistics as well. But sounds as if I was a fan of centralization. I mean, for data collection, yeah. Probably makes sense. But again, difficult to do in a very pluralistic world. 

Interviewer: Yeah, yeah. It's a difficult balance to achieve, isn't it?

Interviewee 05: Yeah, it's kind of, maybe it's also a grassroots thing in a way that open science, open data has arisen in various contexts in linguistics, not all in the same place. So, well, the problem is probably the great heterogeneity that we already have, even though it's not a very long standing tradition in linguistics to share your data.

Interviewer: Yeah, that's true. We come from very different perspectives, right?

Interviewee 05: Yeah.

Interviewer: Yeah. Often we're in different departments.

Interviewee 05: And also maybe something that we probably share with the social sciences as well. If you elicit data, if you collect data from people who agree to give you their information. I mean, you need to get their consent, yeah, but sometimes you might not be sure if their consent would also apply to a reuse of their data or a publication of the data on some kind of platform. You would have to inform them and then that would again be a bit tricky because you have to give them a lot of information, which again would make your data collection rather unattractive for them if they have to read long declarations on data protection, etc. So I don't know. Elicited data, it's probably harder with that. Yeah.

Interviewer: And going back to your own practices, in an ideal world, what would you need to do more open science? Personally?

Interviewee 05: Well, I don't know. If I have a data set that I want to publish, I don't think I need much more. I can do it. No problem. I don't want to publish all kinds of data sets, as I said initially. Yeah, sure.

Interviewer: So that's good. My final question is, I guess, sort of a personal one, because my feeling is often that open science advocates are often speaking among themselves and sort of preaching to the choir and You know you're someone who is involved in open science practices and I wonder whether A you feel that that's the same? And if that's the case or whether my view is skewed and, if so, how can we reach out to more linguists?

Interviewee 05: Yeah. I don't know. In our case it's maybe not quite true that open science advocates are just preaching to the converted. We have a colleague at INSTITUTION who promotes this very much among all of us and also especially among the young PhD students who are only beginning to get involved in research. So that does have some effect, I have to say. So, yeah, in an ideal world, it would be great if every university, every department could have someone like this person. So someone who just loves data and who likes to handle data and work with them process them visualize them explain them make them available and all that and then by so doing help the others to publish their data as well. Yeah so for linguists maybe that's something I don't know. If you for linguists I mean in our studies at least I think at many COUNTRY universities, you don't really receive training of a statistical kind or how to handle data, public data, and so on. We are busy with lots of other things. It's not like they don't learn anything. Even fall short of improving their pronunciation and things like that. They have lots of things to do, but we are not really trained in handling data and handling code processing data with programming software packages like R, et cetera. So that is a bit of a problem. In an ideal world every university should have someone like that and that person should be paid for that because they're not doing it for their personal benefit but for others right? So yeah fortunately we have someone like that.

Interviewer: So yeah CITY is leading the way.

Interviewee 05: I wouldn't say that plenty of good people coming out with CITY. Yeah We're a rather small department. It's just one chair of English linguistics. So we're not that big, but yeah. 

Interviewer: Yeah, but it's funny how, you know, it can make a real difference, you know?

Interviewee 05: Yeah, yeah.

Interviewer: Having a few individuals or even one individual who's very keen.

Interviewee 05: Yeah. Interesting. Yeah.

Interviewer: Yeah, that was my last official question. But if there's anything else you want to add on open science in linguistics or in the humanities more broadly, feel free to share. Do you have any parting thoughts?

Interviewee 05: I don't know. I once had an interview that was distantly similar with someone who I think she works in history. But she did a very interdisciplinary project. Well, asking people from various departments, their views on something like open research I would say not science. That was more that went more into the direction of editing old books so that are just a bit that are not available in digitalized format. Yeah, that's something completely different. So it wasn't so easy to communicate with her because we are using old data, obviously. So if you do diachronic corpus linguistics, you need digitalized old books from a century. So that's what I was interested in. And they have a much bigger focus on who wrote it, under which conditions, for whom, and how many times the manuscript was edited or copied, etc., Whereas I was just interested in the language. Well, I don't know. Nothing in particular, I think.

Interviewer: In that case, thank you very much. I'll stop the recording for now.
