Identifier,Scope,Goal,Method,Measurements,Is Effect Size reported,Within/Between Subject Study,Participant Profile,Number of Participants in Total,Number pf Participants per Treatment,Number of Treatments,Any Incentives (yes/no) ,Incentives Value,Is there a Pay-off Function? (Yes/No),Same or Different Pay-Off Function for all Treatments,"Pay-off function for all Tasks, for a single fixed task or for a randomly selected task",Pay-off Functioned linked to Individuals or Groups,Is there a Show-up Fee?,How is the Pay-off Function?,Is Fixed Time?,Total Time,Hourly wage amount,Is realistic?,Country,Notes DBLP:conf/icse/Chattopadhyay0A20,cognitive biases in software engineering,identifying instances of occuring cognitive biases as well as the time they take to fix during software development,field study/observational study,"no measurements, recordings of development",N/A,N/A,developers (from a US-based software startup company),10,N/A,N/A,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,60 minutes,N/A,N/A,USA, DBLP:conf/icse/GirardiNFL20,developers' emotions,understanding what emotions developers experience during developing and the consequent triggers,field study/observational study,"biometrical data, self-reported emotions",no,N/A,"Computer Science (CS) Students (21 udergrads, 5 grads and 1 post-grad; 23 male, 4 female)",27,27,1,yes,meal voucher,no,N/A,N/A,N/A,no,N/A,yes,30 minutes,N/A,N/A,Italy, DBLP:conf/icse/Krueger0LSWL20,neural representation of development,comparing code writing to prose writing in terms of neural processes,fmri study,"fmri data, some sort of validity of responses (not really explained)",no,Within Subject,27 undergrad and 3 grad students,30 (24 used in analysis due to incompletion and noise in the data),30,4 [content (Prose vs. Code Writing) x size (fill-in-the-blanks vs. long response tasks)],yes,$75 cash & a 3D model of their brain (upon completion of tasks),no,N/A,N/A,N/A,no,N/A,yes,2 hours,$37.5,"YES. Median wage for ""Computer and Mathematical Occupations"" is $37.82. See Hourly Wage Rates by Occupation in Michigan ",USA, DBLP:conf/icse/SpadiniCB20,code reviews,availability bias caused by comments availabel during code review,Controlled experiment,identified bugs,no,Between Subject,software developers,243 (85),~21,4 [priming (yes vs. no) x bug type (Null Pointer Exception vs. Corner Case bug)],yes,$5 donation to a charity per valid response,no,N/A,N/A,N/A,no,N/A,no,11.6 minutes,$25.86,hard to tell (online experiment so participants from various countries. But did not keep track of the countries),"Switzerland, The Netherlands, Sweden", DBLP:conf/icse/TanL20,bug identifying,understanding whether collaborative bug finding helps developers perform better to identify bugs in Android apps,Experiments,identified bugs,no,Within Subject,students,29 (27 valid),29,2,"yes ",N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,China, DBLP:conf/icse/ZierisP20,pair programming,understanding knowledge needs of developers' during pair programming,field study/observational study,"no measurements, recordings of development",no,N/A,software developers,14,N/A,N/A,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,N/A, DBLP:conf/kbse/SatterfieldFM20,identifying the topic of task during developers' information seeking activities,to investigate whether tasks can be automatically identified and whether their descriptions can be automatically generated based on the information that a developer acesses as a part of these tasks.,Recording the screenshots of developers' active window on their computers,no measurements taken. Rather datais collected as input for an algorithm for automation purposes. Data captured from developers' computer screen through OCR is processed by NLP techniques to prepare the input data for automation.,no,none,10 grad students; 4 upper year undergrad students; 3 interns at a mid-sized software company.,17,17,1,no (only for the survey),N/A,no,N/A,N/A,N/A,no,N/A,yes,2 hours,N/A,N/A,Canada, DBLP:conf/sigsoft/0015LSMSW20,human vs. machine and gender bias in code review,"Investigating the effects of the code change author being machine or human, and further ""male"" or ""female"" if the code change author is human on code reviewers' decisions. Also, investigating whether ""males"" and ""females"" review code differently.",Controlled experiment,"fMRI & eye tracking data; and code review behaviours (e.g., code review response time, code change acceptance rates)","yes (for Code Review acceptance rate wrt. code changes'author being machine, male or female)",Hybrid Design: Within-Subject wrt. code changes' author (male vs. female vs. machine); Between-Subject wrt. code change's reviewer (male vs. female) ,26 undergrad and 11 grad students,37 (36 completed),"21 Male (w/ treatments as ""female"", ""male"", ""machine"" code change authors); 16 Females (w/ treatments as ""female"", ""male"", ""machine"" code change authors)",6,yes,$75 cash and 3D models of participants' brains upon completion,no,N/A,N/A,N/A,no,N/A,no,60-70 minutes,"$64,28-$75","Hourly wage of a software engineer in MI, USA is $40.37 (https://www.indeed.com/career/software-engineer/salaries/MI)","Michigan, USA", DBLP:conf/sigsoft/BehrooziSBP20,impact of stress on coding interview performance,"Effect of being watched by a proctor during coding interview (i.e., public vs private setting) on developers' stress and performance",Controlled experiment,"cognitive load (NASA TLX, eye tracking measures, e.g., fixation duration, pupil dilation), stress (eye tracking measure - saccade velocity), task completion time, problem solution correctness, complexity of the algorithm participant came up with",yes (Cohen's d),Between Subject,undergrad and grad students,50 (48 completed),22 (private setting); 26 (public setting),2,yes,extra course credit,no,N/A,N/A,N/A,yes (participants could leave experiment any time),N/A,no,max. 30 minutes,N/A,N/A,"North Carolina, USA", DBLP:conf/sigsoft/GopsteinFAC20,why and how programmers misunderstand atoms of confusion,understanding developers' misunderstandings of atoms of confusion,Think aloud observational study (complemented with semi-structured interviews),correctness of code output; participants' confidence on their answer about code snippets' outputs (rating from 1 to 6),N/A,N/A,5 students; 9 professional C++ developers (4 C++ application developers & 5 developers of a popular C++ library),14,14,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,avg. ~35.6 minutes,N/A,N/A,"New York, USA", DBLP:conf/sigsoft/UesbeckPSS20,polyglot (multi-language) programming,understanding whether switching between different languages during polyglot programming would affect the productivity of developers with different experience levels. (productivity: defined as task completion time),double-blind randomized experiment (repeated measures design),time to give a correct solution for each task,yes (rö2 for t-tests; eta squared for ANOVA; Odds Ratio for Chi Square Tests),Between Subject wrt. String-based Design vs. Object Oriented Design vs. Hybrid Design,"100 undergrad students (12 frehmen, 23 sophomores, 36 juniors, 29 seniors) & 9 professional developers",149 (109 completed),36 (Object Oriented Design); 35 (String-based Design); 38 (Hybrid Design),3 (Object-oriented API vs. established use of SQL strings vs. hybrid approach),yes,extra course credit for students' participation,no,N/A,N/A,N/A,no,N/A,no,there was a threshold : max 4.5 hours in total (max 45 minutes per task) -- Average task completion times are as follows: Task 1 = 30 min; Task 2 = 26 min; Task = 32 min; Task 4 = 19 min; Task 5 =16 min; Task 6 = 21 min,N/A,N/A,"Nevada, USA", DBLP:conf/sigsoft/WangZ20,gender biases in software development,investigate whether intergroup contact theory reduces gender biases in software development,Field experiment (longitudinal),"implicit gender biases (using Gender-Career, General SE and SE Leadership Implicit Association Tests, i.e., IATs) & explicit gender biases (using MSS, i.e., Modern Sexisim Scale) ",no,"Between Subject, but both cotrol and treatment groups consist of ""teams"" rather than ""individual subjects""",280 undergrad students,280 participants in 70 teams,43 teams (implemented intergroup contact - male & female); 27 teams (same gender teams - control group),"2 (Teams with ""female & male"" intergroup contact teams vs. Same gender teams)",yes,$20 for each student,no,N/A,N/A,N/A,no,N/A,no,Project lasted 8 weeks. Extra time on the top of course work estimated to be less than 1.5 hours,$13,min hourly wage in USA is $7.5. Software Engineers without a degree & 1-2 years of experience $35 (https://www.salary.com/tools/salary-calculator/software-engineer-i-hourly?edu=edlev1),USA, DBLP:conf/iwpc/StapletonGLEWL020,developers' program comprehension with the support of code summarization,investigate the effects of human-written and machine generated code summarization on developers' productivity,Online experiment,task completion time; accuracy of answers,no,Within Subject,"35 undergrad students, rest grad students & professional developers",45,45,2 (machine generated vs. human written code summaries),yes,$20 per participant (if they leave their contact info - hence optional),no,N/A,N/A,N/A,maybe (in the paper it does not say that $20 is paid upon completion),N/A,no,45-75 minutes,$16-$26,"Hourly wage of a software engineer in MI, USA is $40.37 and $37.85 for less than 1 year of experience (https://www.indeed.com/career/software-engineer/salaries/MI)","MI, USA -- experiment is online", DBLP:conf/iwpc/DiasOVMB20,tool evaluation,"analyzing the effectiveness of tool Hunter propsed by authors in comparison with Visual Studio Code, a popular IDE in terms of user performance and user expeirence",Controlled experiment,"task correctness & completion times, and attention (using eye tracking)",no,Within Subject,16 professional software developers,16,16,2 (Hunter vs. VSC),no,N/A,no,N/A,N/A,N/A,no,N/A,no,median time for Hunter tool = 58.95 (6.55 min/task); median time for VSC tool = 3 hours 45 minutes (23 min/task),N/A,N/A,Chile, DBLP:conf/iwpc/BaiKS20,Code search activities performed by programming language learners,Understanding how learners perform code searches and challenges they encounter when working with an unfamiliar programming language; and factors affecting a successful search,"logging search and browser activities and periodically surveying participants about their current tasks, and search success",N/A (logs analyzed),no,N/A,graduate Computer Science students,24 (18 participants' data analysed after eliminating those with recordings less than 10 minutes),N/A,N/A,yes,Drawing among participants (who have a recording of at least 45 minutes) to win a $30 Amazon gift card,no,N/A,N/A,N/A,no,N/A,yes,90 minutes,$20,Developers in NC with less than 1 year experience have hourly wage of $39.58 (https://www.indeed.com/career/software-engineer/salaries/NC),"North Carolina, USA", DBLP:conf/iwpc/ShargabiAAZ20,novices' program comprehension,Empirical evaluation of previously published novices' program comprehension mental models. Investigating whether programming tasks improve novices' program comprehension,Controlled experiment (in class),"Task Correctness; Tests measuring Statement-, Block- and Abstraction-Level program comprehension.",no,6 Within- Subject Experiments,students,178,31 (Recall Task); 32 (Representation Task); 28 (Renaming Task); 28 (Tracing Task); 29 (Comparison Task); 30 (Modification Task),6,yes,Rewards offered to top 3 best performing participants (Not mentioned in the paper what these rewards are),yes,same,N/A,no,no,winner-takes-all tournament,yes,120 minutes,N/A,N/A,Malaysia, DBLP:conf/iwpc/PeitekSA20,developers' linearity of source code reading order for program comprehension,provide empirical evidence on the effects of linearity of source code and programmers' comprehension on linearity of reading order,Controlled experiment,eye movements (eye tracking); task correctness; completion time,yes,Hybrid Design: Within-Subject wrt. code snippet's linearity and the enforced bottom-up vs top-down program comprehension strategy; Between Subject wrt. novice vs. intermediate programmer.,"12 novice, 19 intermediate programmers",31,31,2,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Germany, DBLP:conf/ease/RenCSPAL20,modeling chatbots,comparing two tools for collaborative modeling that are based on chatbots,Controlled experiment,"fluency (number of discussion message in a telegram group), completion times, completeness, subjective opinions",yes,"Within subject, 2 tools, 2 tasks, 2 periods",students,54,27,2,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,30 minutes per task,N/A,N/A,Ecuador, DBLP:conf/ease/McChesneyB20,code reading of programmers with dyslexia,using eye tracking to understand whether developers with dyslexia comprehend program differently from developers without dyslexia,Observational study,"eye-gaze metrics, story7execution orders",yes,between subject,developers with and without dyslexia,28 (14 w & w/o dyslexia),28,3,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,"likely UK, but not specified", DBLP:journals/ese/AmalioBK20,software modeling,"comparing three different modeling notations to each other, namely VCL, UML, OCL",controlled experiment,"completeness, accuracy, perceived modeling, problem comprehension, perceived comprehension, model comprehension, defect detection, perceived usage, usefulness/ease of use, usability, appraisal, preferred notation",yes,within subject,computer science students,43,43,"4 (2 examples, 2 notations, 5 tasks)",yes,50Û vouchers,no,N/A,N/A,N/A,no,N/A,yes,2 hours,N/A,N/A,"Portugal, Luxemburg, UK", DBLP:journals/ese/GralhaGA20,gender differences in the context of socual goal modeling,"investigating the impact of gender facets (for inclusivenesss) on creating, modifying, understanding, or reviewing iStar models",quasi-experiment,"accuracy, speed, ease, perceived effort",yes,between subject,"developers from personal contacts (students, scientisits, practitioners, ...)",180,50/50/40/40,4 (tasks),no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Portugal (apparently), DBLP:journals/ese/YatesPB20,"onboarding process, when a new developer joins a company",exploring different types of information passed to onboarders (new coming software engineers) from experts and the perceived value of this information by onboarders,in-situ observational field study (researchers observe and collect data from 12 onboarding sessions),"(Grounded Theory is used) Two video recordings are made: One is the screen capture of the software and the other captures the participants from their point of view (i.e., what they see) and their interactions with other participants.",N/A,N/A,"12 experts (5 academic, 7 industry); 15 newcomers",27,N/A,N/A,no,N/A,no,N/A,N/A,N/A,no (Researchers are observing actual onboarding sessions in organisations),N/A,no,minimum = 18 minutes; maximum = 96 minutes; mean = 51 minutes;,N/A,N/A,"in-situ onboarding sessions in Canada, US, England, Ireland", DBLP:journals/ese/JolakSDWHVPPGC20,software design communication,"investigate how graphical versus textual software design description affects the participants' ability to explain, understand, recall and actively communicate knowledge.",family of experiments (4 experiments in total),"Participants' Understanding, Recall and Understanding of the software design measured through questionaires.",yes (Hedge's g metric calculated for each experiment. Global effect size is claculated by the weighted mean.,Between Subject,Software Engineering (SE) students. original experiment: 50 BSc and MSc students; 1st replication: 36 MSc. and Ph.D. students; 2nd replication: 94 MSc SE students; 3rd replication: 60 BSc and MSc SE students. ,240 (120 pairs) - 7 pairs eliminated (5 pairs discussed relevant topics for less than 2 minutes; audio recording of 2 pairs was bad quality),"120 (60 Explainer-Receiver pairs: In both control and test groups, participants were paired as ""Explainer-Receiver"").",2 (Graphical vs. Textual software design descriptions),no,N/A,no,N/A,N/A,N/A,no,N/A,no,"32 minutes actual experiment (fixed time), but there is no time limit on pre- and post- questionnaires each of which takes 15 minutes on average (30 minutes in total on average). Hence, total time is 62 minutes on average",N/A,N/A,Sweden (original experiment); Germany (1st replication); France (2nd replication); Slovakia (3rd replication), DBLP:journals/ese/ViticchieRBTCT20,techniques to fight malicious tampering with sesitive code on client side,compare non-protected code with code protected with code splitting to evaluate the degree of protection offered by the Client/Server Code Splitting technique.,controlled experiment,attack time; attack success rate; C-score (indicating C language settings),yes (Odds ratio),Between Subject,Computer Science and Engineering master students,87,28 (no-splitting); 30 (medium sized splitting); 29 (small-sized splitting),"3 (codes with ""no splitting"", ""small-sized splitting"", ""medium sized splitting"")",yes (on page 24 this paper briefly discusses pros and cons of incentivizing students),"extra points for final exam: students who did the task with diligence regardless of attack success were rewarded with 2/30 extra points in the final exam of the ""Computer and system Security"" course. ",no,N/A,N/A,N/A,"yes (points awarded independent of performance)",N/A,no,max 120 minutes,N/A,N/A,Italy, DBLP:journals/ese/VassalloPZG20,Continuous Integration build failures,"evaluate BART, which is a tool that summarizes the reason for MAVEN build failures (i.e., investigate if BART speeds up the fix of build failures)",controlled experiment,Resolution time of build failures,no,Within Subject,students and professional developers (11 out of 17 work as professional developers),17,17,2,no,N/A,no,N/A,N/A,N/A,no,N/A,no,average build fix times: 436 seconds (Testing failure); 187 seconds (Compilation Failure); 223 seconds (Dependencies failure); 280 seconds (Code analysis failure),N/A,N/A,Switzerland, DBLP:journals/ese/LaTozaALK20,strategies developers use to solve programming related problems,"evaluate the effectiveness of Roboto, which is the strategy tracking tool proposed in the paper, in design and debugging tasks",controlled experiment,task time,yes (odds ratio),Between Subject,"grad and undergrad students, and software developers",28,14,2 (with and wothout the strategy tracker),yes,$30 gift card,no,N/A,N/A,N/A,"yes ( On page 244 of the paper, it writes: ""..To ensure that incentives did not bias the participants, all participants were compensated the same amount, regardless of condition, task performance or their responses."")",N/A,yes,30 minutes,$60,$38 median value,"Virginia (VA), USA", DBLP:journals/ese/MasoodHB20,self assignment in agile development teams,investigating how self-assigment of tasks in agile teams work,"observations (2nd phase of the study includes observations) [observations of agile practices such as daily stand-ups, iteration planning meetings, and self-assigment during task breakdown sessions.]","white board images, screenshots from the management tools",N/A,N/A,software developers,"7 [Overall study consists of questionnaires, interviews and observations with 42 developers belonging to 28 agile teams. However, the observational study was conducted only with one agile team consisting of 7 members]",N/A,N/A,no,N/A,no,N/A,N/A,N/A,no,N/A,no,"[TOTAL duration is between 7 hours 20 minutes and 8 hours ] ----- Stand-up meetings = 40-60 minutes (4 meetings, 10-15 min each); Sprint Planning meetings = 2 hours (2 1hr meetings); Task Breakdown Sessions = 2 hours; Code Review Session = 30 minutes; Squad-Triage Sessions = 40-60 minutes (4 sessions, 10-15 minutes each); Backlog Prioritizarion = 30 minutes; Retrospective Meeting = 1 hour. ",N/A,N/A,"New Zealand (interviews conducted also with developers from India and Pakistan, but observational studies only in New Zealand)", DBLP:journals/ese/AllodiCMS20,vulnerability assessments using Common Vulnerability Scoring System (CVSS),"explore to what extent the accuracy of assessment of vulnerabilities depend on the assessor's educational background, knowledge of attacks, years of practical experience. Also analyzing whether accuracy of security vulnerability assessments vary with respect to different facets of vulnerabilities (e.g., complexity of the exploitation) ",controlled experiment,"accuracy of participants' vulnerability severity estimations, vulnerability assessment errors",yes (R2 coefficient of determination),Between Subject,MSc students and security professionals,73 (71 participants with valid data),35 MSc students with NO security training; 19 MSc students with 3-4 years of security training; 19 security professionals with meadian of 6 years of security experience. ,3,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,90 minutes,N/A,N/A,Italy, DBLP:journals/ese/FakhouryRMAA20,program comprehension,"(1) determine if ""functional Near Infrared Spectroscopy"" (fNIRS) and eye tracking devices can be used to capture cognitive load during text and source code comprehension at word-level granularity. (2) determine if structural and lexical inconsistencies within the source code increase developers' cognitve load duirng software development. ",controlled experiment,"cognitive load (NASA TLX and ""eye fixation duration""), task correctness, task completion time; Oxy (oxygenation concentration changes in the blood)",Cliff's Delta (d),"Hybrid: Between Subject (with respect to ""Code"" vs. ""Text"" used for Comprehension Task & Within Subject (with respect to ""control"" code, code with Lexical Antipatterns (LA), code with ""Structural Inconsistencies (SI)"", code with both LA and SI used in the Bug Localisation Task) ",grad and undergrad Computer Science students,25,"15 participants (""English Prose"" for Comprehension Task);10 participants (""Code with comments in German"" for Comprehension Task); 17 participants (for ""control code""for Bug Localisation Task); 20 participants (for code with LA for Bug Localisation Task); 20 participants (for code with SI in Bug Localisation Task); 18 participants (for code with both LA and SI for Bug Localisation Task)",6,yes,$15 gift card,no,N/A,N/A,N/A,"yes (in Section 3.3, it writes ""Participants receive $15 gift cards as compensation for participation."")",N/A,no,not longer than 1 hour,$15 (Giftcard),"average hourly wage for software engineering interns in WA, USA $29.67 and minimum $19 (see https://www.payscale.com/research/US/Job=Software_Engineering_Intern/Hourly_Rate/5d8d6732/Seattle-WA)","Washington (WA), USA", DBLP:journals/ese/SaidQK20,extracting state machines from source code for program comprehension,EXPERIMENT1: Investigating the understandability of guards in state machines where transitions have priorities. [Definition: A guard is a boolean expression such that state changes or other actions are taken only when evaluates to true]; EXPERIMENT2 : investigating whether developers' state machines extracted from code facilitates developers' program comprehension. ,controlled experiment (both EXPERIMENT 1 & EXPERIMENT 2),"task correctness, task completion time (in both EXPERIMENT1 & EXPERIMENT2)",,"EXPERIMENT1: Within Subject [In ""Subjects"" subsection of the paper on Page 4792, it writes :""Each subject was assigned to experiment group for some of the state machines and to the control group for the remaining state machines""] EXPERIMENT2: also Within Subject",EXPERIMENT1: 8 professional developers who don't work with state machines; 8 professional developer who regularly work with state machines; 4 PhD students; 4 MSc students. EXPERIMENT2: 22 professional developers; 6 PhD students; 6 students,EXPERIMENT1: 24; EXPERIMENT2: 34,EXPERIMENT1: 24; EXPERIMENT2: 34,EXPERIMENT 1: 2 [state machines with transitions with priorities; and those without priorities] ; EXPERIMENT2: Code and specifications with and without state machines ,no,N/A,no,N/A,N/A,N/A,no,N/A,EXPERIMENT1 & EXPERIMENT2: yes,EXPERIMENT1: 3 minutes for each question; EXPERIMENT 2: 70 minutes,N/A,N/A,Germany, DBLP:journals/ese/AbdellatifBS20,Using Bots to extract useful information from software repositories.,Validating the effectiveness of Bots that extract useful information from software repositories.,experiment,Participants' task completion times with and without bots. Recordings of participants' interactions while completing the tasks. Accuracy of the answers by both Bot and the participants (when they did the tasks without the Bot).,no,Within Subject,4 Ph.D. and 8 MSc students,12,12,2 (with and without Bot),no,N/A,no,N/A,N/A,N/A,no,N/A,yes,max 30 minutes for both Tasks doen with and without bots.,N/A,N/A,Canada, DBLP:journals/ese/VegasRMJ20,Developers' misconceptions about effectiveness of software defect detection techniques.,investigating how well developers' perceptions on the effectiveness of defect detection techniques match the techniques' real effectiveness in the absense of prior experience.,controlled experiment,"defect detection techniques' effectiveness (measured as the percentage of faults detected by that technique), perception of effectiveness. mismatch cost",no,Within Subject,students,"23 (original experiment); 39 (replication). In the origial experiment, 32 students completed the experiment, but 9 students did not give consent for their data to be used in the analysis of experiment results. In the replication, 46 students completed the experiment but 7 did not give consent for their data to be used in the analysis of experiment results.",23 (original experiment); 39 (replication),"(2 testing techniques, i.e., equivalence partitioning ""EP"", branch testing ""BT""; 1 code review ""CR"""" technique)",yes,"Experiment is an assigment in a 6 credit ""Software Verification and Validation"" course. Students are supposed to do the tasks in the experiment to get the assigment grade. However, students may opt to retrieve from the experiment. In that case, their data will not be included in the experimet result, and also not be part of the analyses.",no,N/A,N/A,N/A,no,N/A,yes,"12 hours in total. In each session, participants complete a task for each defect detection technique (i.e., EP, BT, CR) Each session takes on a particular day of the three consecutive weeks and lasts 4 hours. Hence total time is ""4 hours/session x 1 session/week x 3 weeks = 12 hours""",N/A,N/A,Spain, DBLP:journals/ese/MoralesKA20,effect of automated refactorings'on developers'program comprehension,investigate the effects of automatically generated refactorings on developers' program comprehension in comparison to manually refactored code.,controlled experiment,"time spent in experiment, percentage of correct answers, effort (using NASA TLX)",yes (Cliff's delta),Within Subject,freelance developers,30,30,2 (refactoring performed by machine vs. by human),no (voluntry participation),N/A,no,N/A,N/A,N/A,no,N/A,no,median = 22 minutes (for program comprehension tasks with manually refactored code): median = 27.6 minutes (for program comprehension tasks with code that is automatically refactored),N/A,N/A,Canada, DBLP:journals/ese/SayaghKPBA20,configuration engineering challenges,"evaluating the degree to which the tool ""Config2Code"" that deals with configuration engineering challenges assists developers.",controlled experiment,"task completion time, task correctness",no,Between Subject,"2 industry developers; 5 freelance developers, 48 students",55,"29 participants (test group , ""Config2Code"" tool), 26 participants (control group, ""Preferences"" tool)","2 (""Config2Code"" tool vs. ""Preferences"" tool)",yes (35 CAD paid to 5 freelance developers upon task completion and quality satisfaction); students were given bonus in their courses & a participation certificate.,35 CAD,no,N/A,N/A,N/A,no,N/A,no,median = 25.99 minutes (test group); median = 47.18 minutes (control group); ---- tasks designed not to exceed 1 hour 30 minutes,Å 23 CAD,"yes, authors adujusted the amount according to the median flat sum of Freeelancer.com",Canada, Czepa2020,"Understandability of temporal property of three major representations, which are Linear Temporal logic (LTL), Property Specification Patterns (PSP), Event Processing Language (EPL).",The goal is to test the following hypotheses: (1) PSP is easier to understand than EPL and LTL; and (2) EPL is easier to understand than LTL.,2 controlled experiments,task correctness and response time (in minutes),yes (Cliff's delta),Between subject,"Students of the Faculty of Computer Science at the University of Vienna, Austria, who enrolled in the courses ÒDistributed System Engineering (DSE) LabÓ and ÒAdvanced Software Engineering (ASE) LabÓ",216,"EXPERIMENT1: LTL group (26); PSP group (20); EPL group (24); EXPERIMENT2: LTL group (47 = 31 DSE students + 16 ASE students; PSP group (44 = 27 DSE students + 17 ASE students); EPL group (45 = 28 DSE students + 17 ASE students) [""DSE students"" refer to the students enrolled in Distributed Systems Engineering Lab; and ""ASE students"" refer to students enrolled in Advanced Software Engineering Lab]","3 (LTL, PSP and EPL)",yes,EXPERIMENT1: The experiment was a mandatory course assignment which made up 10% of the total points one could get from that course; EXPERIMENT2: Participating the experiment was optional that was awarded up to 10 bonus points,piece rate (the solutions are graded based on performance),same,all,individuals,no,N/A,no,EXPERIMENT1: Mean response times: 69.98 min (LTL); 58.25 min (PSP); 72.12 min (EPL); EXPERIMENT2(DSE): 51.03 min (LTL); 36.65 min (PSP); 51.80 min (EPL); EXPERIMENT2 (ASE): 55.32 min (LTL); 39.12 min (PSP); 44.00 min (EPL); ,N/A,N/A,Austria, Do2020,difficulties of using static analysis tools and how to overcome such challenges,To understand the difficulties static analysis tools to detect and fix bugs and security vulnerabilities in code and propose a tool that overcomes the diffiulties and evaluate this proposed tool,a controlled experiment,number of errors identified and fixed by each participant; time spent using each feature of Eclipse and the developed tool (Visuflow),no,Within subject study,researchers in academia (65%); researchers in industry (5%); students (30%),20,20 (since it is a within subject study),2,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,20 minutes,N/A,N/A,Germany, Fucci2020,Sleep deprivation andnovice developers' performance,To understand the effect of 1 night sleep deprivation on novice developers' performance using agile test-first development,quasi-experiment,functional correctness of impleemnted code; participants' engagement in coding; ability to apply Test First Development (TFD),yes (Cliff's delta),Between subject,"final year computer science undergraduate students enrolled in an Information Systems course at the University of Basilicata, Italy",45,22 (regular sleep); 23 (sleep deprivation),2,no,voluntary exercise in the Information Systems course,no,N/A,N/A,N/A,no,N/A,yes,90 minutes,N/A,N/A,Italy, Peitek2020,Measurement of program comprehension by using functional magneticresonance imaging (fMRI),To understand the feasibility of using fMRI to measure developers' program comprehension,observational study (in a controlled environment),"(1) BOLD signals in % in the Middle frontal gyrus, middle temporal gyrus, inferior parietal lobule, inferior frontal gyrus; (2) The locations of deactivated areas in the brain and BOLD sognals % in the deactivated areas of the brain; (3) (Brodmann area activations) activations 5 brain areas and their correlation to programming experience and Java knowledge.",no,N/A,computer science and mathematics students at the University of Magdeburg,28 (Two replications were conducted. EXPERIMENT1: 17 participants; EXPERIMENT2: 11 participants),28,1,yes,EXPERIMENT1: Each participant received 20 Euros; EXPERIMENT2: no incentives reported for the replication study.,no,N/A,N/A,N/A,potentially,N/A,yes,EXPERIMENT1: 39 minutes (9 minutes for anotomical measurement stage plus 12 trials each lasting 2.5 minutes); EXPERIMENT2: 33 minutes (9 minutes for anotomical measurement stage plus 12 trials each lasting 2 minutes),30.77 Euros (for first experiment),"Yes (The amount is higher than the hourly wage of a junior developer in Germany, which is approx. 22.5 Euros/hour based on the fact that salary is 54K Euros (https://cult.honeypot.io/developer-salary-report-2021/developer-salaries-germany-2021) owever, given that the participants have to enter an MRI machine, which causes discomfort 30.77 Euros/hour should be reasonable",Germany, Romano2020,Dead code in open source and commercial software systems,"To understand when and why developers introduce dead code, how they preceive and cope with it and whether dead code is harmful.",4 controlled experiments,"(Program) Comprehension Effort (CTime), Effectiveness (CF, CAvg) and Efficiency (CF/CTime, CCnt/CTime) are measured for the first, second, third experiments and (Code) Modification Effort (MTime), Effectiveness (MAvg) and Efficiency (MCount/MTime) are measured for the first and fourth experiment. Details about the metrics are as follows: (1) CTime: Time to complete a program comprehension task (minutes); (2) CF: The average of the F-measure for each participant in relation to their answers to questionnaire; (3) CAvg = CCnt/n: The portion of the completely and correctly answered questions; (4) CF/CTime: The average F-measure per time (in minutes) for each participant; (5) CCnt/CTime: The number of correctly and completely answered questions per time (in minutes); (6) MTime: Time spent (minutes) to execute a modification task; (7) MAvg: The portion/ratio of correctly impleemnted change requests; (8) MCount/MTime: Number of change requests correctly implemented per minute",yes (Cliff's delta),Between subject,EXPERIMENT1: 3rd year undergraduate students in Computer Science at the University of Basilicata; EXPERIMENT2: Second year graduate students in Computer Engineering at the University of Basilicata; EXPEIRMENTS 3& 4: Undergraduate and graduate studentsin Computer Science at the College of William and Mary; ,83,EXPERIMENT1: No Dead Code group (20); Dead Code group (27) EXPERIMENT2: No Dead Code group (9); Dead Code group (10);EXPERIMENT3: No Dead Code group (5); Dead Code group (6); EXPERIMENT4: No Dead Code group (3); Dead Code group (3); ,"2 (""No Dead Code"" vs. ""Dead Code"")",yes,EXPERIMENTS 1&2: mark in the course; EXPERIMENTS 3 & 4: 30 USD,no,N/A,N/A,N/A,potentially,N/A,no,Mean values for the first experiment: 35.05 min (No Dead Code); 33.7037 min (Dead Code); Mean values for the second experiment: 86.556 min (No Dead Code); 82.7 min (Dead Code); Mean values for the third experiment: 56.4 min (No Dead Code); 47.6667 min (Dead Code); Mean values for the fourth experiment: 52.95 min (No Dead Code); 51.6667 min (Dead Code); ,approx 34.61 USD,"YES [Calculated for the third and fourth experiments] (Hourly wage is approx. 32 USD, given that minimum salary for a junior developer in Williamsburg, Virginia, USA, is 61,337 USD, Reference: https://www.salary.com/research/salary/listing/junior-software-developer-salary/williamsburg-va)",Italy (first and second experiments) and USA (third and fourth experiments), Soltani2020,Automated crash reproduction using Genetic Algorithms,"To understand the usefulness of EvoCrash (i.e., the proposed tool that employs genetic algorithms for automated crash reproduction), for debugging code.",experiment,whether the defect is detected or not; time to perform each task,yes (Vargha-Delaney A^ 12 statistic),Within subject study,master students in Computer science from the Delft University of Technology,35,35,2 (with and without EvoCrash teste,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,90 minutes (45 minutes for each task),N/A,N/A,The Netherlands, Bao2020,Extracting source code from screencasts of programming tutorial videos,"To evaluate the proposed tool's (psc2code) applicability to enhance developers' interactions with programming videos (i.e., whether the tool can help developers navigate and explore programming videos effectively.)",controlled experiment,task completion time and correctness,no,Between subject,"undergraduate students from the College of Computer Science in Zhejiang University. Out of these 10 students, 6 are junior students (freshman and sophomore) and 4 are senior students",10,5 (with psc2code) and 5 (without psc2code),2 (with and without psc2code),no,N/A,no,N/A,N/A,N/A,no,N/A,no,"Task V1: 367 seconds (baseline), 322 seconds (psc2code); Task V2: 532.5 seconds (baseline), 300.8 seconds (psc2code); Task V3: 714.6 seconds (baseline), 415.2 seconds (psc2code) ",N/A,N/A,China, Beschastnikh2020,"Tackling three tasks frequently performed during analysis of distributed system executions: (1) understanding the relative ordering of events, (2) searching for specific patterns of interaction between hosts, and (3) identifying structural similarities and differences between pairs of executions.",User evaluation of the proposed approach,1 controlled experiment,number of correctly answered questions about the distributed systems used in the study;,yes (Cohen's d),Between subject,"graduate students (14), undergraduate students (24), 1 professor at the University of Massachusetts Amherst",39,15 (control: raw logs); 24 (treatment: used the tool Shizviz),2 (raw logs and Shizviz tool),no,N/A,no,N/A,N/A,N/A,no,N/A,yes,60 minutes,N/A,N/A,likely Canada or USA, Kafali2020,Engineering sociotechnical systems with respect to stakeholders' requirements,To evaluate the effectiveness of the proposed pattern-based approach in capturing and refining stakeholder requirements,controlled experiment,(1) Coverage of requirements specifications; (2) Correctness of specifications; (3) Time spent to produce requirements specifications; (4) participants' subjective Ease of creating specifications,yes (Hedge's g),Between subject,"graduate computer science students at NC State University (93.33% MSc students, 6.67% PhD students)",32,16 (control); 16 (treatment),2,yes,20 USD,no,N/A,N/A,N/A,no,N/A,no,63 minutes (mean time for treatment group); 58 minutes (mean time for the control group),39.90 USD,"No (Lower than average hourly wage of a developer in Ragleigh, NC. Since participants are graduate students ""developer"" salary is used rather than ""junior developer"" salary; Useful link for slaries in North Carolina: https://www.zippia.com/software-developer-jobs/salary/raleigh-nc/)",USA, Shen2020,Reuse of APIs from existing libraries,To understand if the tool implementation of the propposed solution improves th efficiency of reusing libraies in real life programming,controlled experiment,task completion time (tasks consisted of impleemnting missing functions in code files); number of webpages openned during coding,no,Between subject,N/A,8,8,2 (control: coding environment without NLI4j plugin; treatment: coding environment with NLI4j plugin),no,N/A,no,N/A,N/A,N/A,no,N/A,no,"Average time for: (1) control group: 674 seconds (novice developers), 290.7 seconds (experienced developers); (2) treatment group: 317.7 seconds (novice developers), 230.3 seconds (experienced developers)",N/A,N/A,likely China, Azizi2020,An approach for symbolic execution of Epsilon Transformation ALnguage and detecting logical errors,To evaluate the usefulness and usability of the propsed approach,controlled experiment,task correctness,no,Within subject study,students familiar with model driven engineering,12,12,2 (treatment1: without using a test tool; treatment2:with using SEET tool) ,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Iran, Corradini2020,Process modelling using BPMN (Business Process Model and notation) models,"To assess if the proposed model S3 can effectively support designers in delivering models respecting safeness, soundness and message relaxed soundness and if users perceie S2 usable and useful.",1 controlled experiment,"soundness, safeness and message-relaxed soundness of the models geenrated by the experiment participants",no,Between subject,second year MSc students in Computer Science (Enterprise Software systems curriculum) at the University of Camerino,26,13 (control); 13 (treatment),"2 (control: bpmn.io environment with no automatic support for quality checking), treatment: bpmn.io environment with S3 functionality support)",no,N/A,no,N/A,N/A,N/A,no,N/A,yes,2 hours,N/A,N/A,Italy, Taipalus2020a,"SQL query formulation, SQL education",To understand the effect of database complexity on SQL quey formulation,experiment in the wild,"correctness of SQL query formulation; number of syntax, sematics and logical errors in incorrect final SQL queries; number of complications in final SQL queries",no,Within subject study,second year students in computer science or information systems,"744 (three cohorts: 237, 280, and 227 students),",744,"3 (simple database, semi-complex database, complex database)",yes,course credits (unclear phrasing regarding how assigned),piece rate (the solutions are graded somehow),same,all,individuals,no,N/A,no,N/A,N/A,N/A,Finland, Cornejo2020,The overhead of collecting data about the sequences of function calls executed by an application while running in the field that is introduced to users,To investigate to what extent collecting data about function calls may impact the user experience quality,controlled experiment (in the laboratory),"participants' perception of the sped of the applications they used to perform assigned tasks (i.e., ""running slow"", ""running as expected"")",no,Within subject study,"students, researchers, professors from the Computer science Department of the University of Milano",22,22,"14 treatments in total. There are 5 System Response Time categories by Seow [1] ""Instantaneous,"" ""Immediate,"" ""Continuous simple,"" Continuous complex,"" ""Captive"". For each of the System Response Time categories ""Instantaneous"" and ""Continuous simple,"" the overhead ranges ""0%-30%"", ""30%-80%"", ""80%-180%"" and ""180+%"" were used. For the System Reponse category ""Immediate"", the overhead ranges ""0%-30%,"" ""80%-180%"" and ""180+%"" were used. For the System Reponse category ""Continuous complex"", the overhead ranges ""0%-30%"" and ""80%-180%"" were used. For the System Reponse category ""Captive"", only the ""0%-30%"" overhead range was used. References: [1] Seow, S. C. (2008). Designing and engineering time: The psychology of time perception in software. Addison-Wesley Professional.",no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Italy, Goumopoulos2020,Enabling the end-user to actively participate in the design of pervasive computating systems,To assess the usability of the developed tool that allows end users to configure their own smart environments.,observational study,Rule editing success & fail proportions; rule editing time (seconds); TAM3 (Technology Assessment Model usability evaluation --> This one is excluded since it is participant's perceived usefulness.,no,N/A,"administrative and technical staff of the University of the Agean, Greece, and their acquitances. (62% female and 32% male). Participants did not have any programming language knowledge",20,20,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Greece, Nunez2020a,Model-driven development approach for smart phone applications for different platforms,"To investigate the efficiency and effectiveness of modelling, code generation, application generation using the proposed approach ""MoWebA Mobile""; and comparing the effectiveness and efficiency of ""manual"" and ""automatic"" modification of the generated application (""automated"" modification is done by the proposed approach MoWebA Mobile)","observational study & a controlled experiment. Observational study consists of modelling, code generation and application generation using the proposed approach. Controlled experiment compares effectiveness and efficency of ""manual"" vs. ""automated"" modifications to the generated application ",Modelling time; Modeling success rate; code generation time; code generation success rate; time to import the generated code to the respective IDEs and generate the application; application generation success rate; time to make manual modifications to the generated code; time it takes to make automated modifications to the generated application; success rate of the modifications made to geenrated applications,no,"Between subject (the experiment conducted to compare ""manual"" vs ""automated"" modifications to generated applications)","6 students of the 8th semester of the Informatics Engineering career at the Catholic University; 5 mobile developers, four of them with experience working with Android applications, and one developer with Windows Phone applications development experience.",11 (for observational study) and 6 (out of 11) for the controlled experiment,N/A (it only writes that 5 developers were divided into two groups),"2 (the experiment conducted to compare ""manual"" vs ""automated"" modifications to generated applications)",no,N/A,no,N/A,N/A,N/A,no,N/A,yes,493 minutes (First session: 150 minutes; Second session: 140 minutes; Third session: 203 minutes) ,N/A,N/A,Paraguay, Teixeira2020,Code generation to support deployment of Internet of Things by developing Situation-Aware and Business-Aware applications and reducing the need of IoT specific knowledge,Empirical evaluation of the proposed solution LAURA (Lean AUtomatic Code generation for situation-aware and busines-awaRe Apllications),experiment,"ease of use, usefulness, intention of use of the proposed tool",no,Between subject,"Business Process Modelling (BPM) professionals, systems developers in universities and companies. 30% have a masterÕs or doctorate degree. The predominant area is services (50%), followed by the academic area (43%). Most (73%) have basic knowledge of IoT, however, most (67%) do not have advanced IoT knowledge. Participants with the System Expert profile used JavaScript (27%), Java (13%) or Node.js (10%). The average age of the participants is 32.3 years and they have a large professional background, on average, 6.6 years",30,15,2 (BPN experts vs. systems experts),no,N/A,no,N/A,N/A,N/A,no,N/A,no,1.8 hours (on average),N/A,N/A,Brazil, Urbieta2020,A rule-based approach (LEL - Language Extended Lexicon) to consolidate information dispersed in user stories for agile requirements management,To assess the improvements the proposed rule-based approach LEL brings for requirements management related daily activities.,a controlled experiment,"precision, recall, F-measure, spent time, complexity ( ""Complexity"" is a subjective measurement in an interval from 1 to 5 where 1 is the lowest difficulty and 5 is the greatest difficulty.)",yes (Cliff's delta),Between subject,"software engineers from different software companies and institutions with 8.3 years of programming experience and 6.82 years of requirements analysis experience. 85% of subjects have a degree (bachelor, master or Ph.D.). A bachelor degree in South America is comparable to a European Master Degree because of its curricular definition. some are researchers",36,18,"2 (control: participants in this group are give ""user Stories."" treatment: participants in this group are given LEL material) ",no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Argentina and Spain, Valderas2020,Business Process Model and Notation (BPMN) based approach to create micro-service compositions,To compare efficiency of the proposed BPMN based apporach to develop and update microservice composition to solutions based on ad hoc development,conrolled experiment,task completion time,no,Within subject study,"researchers in software engineering, ages between 27-42 years old. Participants have extensive background in Java and modeling tools. 3 of the participants have experience in Spring Framework and message queues, and 4 of them had previously worked with BPMN.",9,9,2 (control: ad hoc solution treatment: BPMN approach),no,N/A,no,N/A,N/A,N/A,no,N/A,yes (time limit for development of a microservice composition is 5 hours and time limit for updating a micro-servce composition is 3 hours).,N/A,N/A,N/A,Spain, Gil2020,Methods and techniques to help designers analyse and design human-in-the-loop systems for Cyberphysical systems,observational study: To test the implemented prototype of the proposed design method and experiment: to evaluate the usefulness of the proposed design method,1 user-observational study a& 1 experiment,"observational study: human response time to tasks (e.g., takeover, handover, emergency takeover, emergency handover, supervised autonomous driving, supervised manual driving). experiment: effectiveness (percentage of correct tasks carried out and efficiency (task completion time in minutes). The experiment also has the following ""subjective"" measurements: perceived ease of use, perceived usefulness, intention of use",no,observational study: N/A; experiment: Between Subject,"observational study: Participants with occupations not related to the academic environment or computer science. None had experience in computer science technology or autonomous cars. Three of the participants were females and three males; two of the participants were between 19-30 years old, two were between 31-45 years old and two were between 46-55 years old. Experiment: 22 subjects (16 males and 6 females, and between 22-40 years old) who were students in the MasterÕs Degree program in Software Engineering, Formal Methods, and Information Systems. All participants have extensive Java background and konwledge in OSGi (i.e., Dynamic module system in Java)technology, 72.23% had experience in modelling tools, 36.34% defined themselves as expert in Eclipse IDE and remaining defined themselves as intermediate, 18.18% had experience in Internet of Things.",observational study: 6; experiment: 22,observational study: 6; experiment: 11,observational study: 1; experiment: 2 (control: coding centric method; treatment: proposed method,no,N/A,no,N/A,N/A,N/A,no,N/A,observational study: no; Experiment: yes (time limit: 1 hour),"observational study: 157.05 seconds; experiment: avg. time for code centric method = 51.45 minutes; avg. time for proposed solution = 36.63 minutes (Time for observattional study is calculated as follows: in section 7.2.5., it writes that there is 30 seconds between tasks, since there are 5 tasks this makes 150 seconds plus average response time of participants for each task as reported in Table 2) ",N/A,N/A,Spain, Uddin2020,A proposed solution to mine API usage scenarios from Stack Overflow,To investigate whether users prefer the proposed solution compared to API official documentation,controlled experiment,"code correctness, task completion time; effort spent (""effort spent"" is a subjective measurement since NASA TLX is used for its measurement)",yes (Cliff's delta),Within subject study,"software developers, 88.2% actively involved in software development (94.4% among the freelancers and 81.3% among the university participants) and had a background in computer science and software engineering. The number of years of experience of the participants in software development ranged between <1 - 10 years: three (all of them being students) with less than one year of experience, nine between one and two, 12 between three and six, four between seven and 10 and the rest (nine) had more than 10 years of experience.",31,31,"4 (Proposed solution, Stackoverflow, Official API documentation, search engines)",yes (only the participants recruited from Freelancer.com),20 USD,no,N/A,N/A,N/A,no,N/A,no,average minimum = 74.4 minutes (18.6 x 4); average minimum = 94.4 minutes (23.7 x 4); ,N/A,N/A,"online, Canada, Bangladesh", Lian2020,"Assisting engineers to identify requirements knowledge from a collection of domain documents,Ê",To investigate if the proposed solution MaRK-II is helpful tosoftware engineeers to extract requirements,online experiment (experiment material exchange is through emails),number of requirements specifications extracted,no,Between subject,"IT engineers with Master or Doctoral degree in software engineering who work IT engineers in various domains (e.g., banking, research, technology and web-services) for at least three years after graduation.",10,5,"2 (Control: original domain documents; Treatment: domain documents with highlighting, the relevance ranking of these five documents and summary of each document.) ",no,N/A,no,N/A,N/A,N/A,no,N/A,yes (1 hour),max 1 hour,N/A,N/A,China, Costa2021,Refactorings to discipline #ifdef annotations,To analyze the correlation between three refactorings to discipline #ifdefs with improvements in program comprehension and visual effort,controlled experiment,"task completion time, task accuracy, visual effort (i.e., fixation duration, fixation count, and regressions count)",yes (Cliff's delta),Within Subject,Novices in C programming language,64,64,2 (undiciplined & disciplined #ifdef notations),no,N/A,no,N/A,N/A,N/A,no,N/A,yes,~50 minutes,N/A,N/A,Brazil, Olsson2021,"Software developers' affective states (feelings, emotions and moods) and technical debt",To investigate the relationship between software developers' affective states and technical debt.,experiment,valence (using Self assessment Manikin),yes,Between Subject,"software practitioners working in various domains (e.g., finance, automative, renewable energy) with 1035 years of experience.",40,40,5,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,90 minutes,N/A,N/A,Sweden, Santos2021,12 experiments conducted within the context of Test Driven Development,To provide meta-analysis of the results obtained from 12 isolated experiments on TDD,12 experiments,percentage of tests that pass,yes,Between Subject (Experiments 1-7); Within subject (Experiments 9-12); crossover (Experiment 8),"Students (Experiments 1, 4, 5, 8-12); Professionals (Experiments 9-12)","For experiments 1-12: 16, 18, 20, 48, 53, 8, 13, 20, 41, 69, 64, 41","Exp1: 16 (ITL), 14 (TDD); Exp2: 16 (ITL), 17 (TDD); Exp3: 20 (ITL), 20 (TDD); Exp4: 44 (ITL), 43 (TDD); Exp5: 49 (ITL), 52 (TDD); Exp6: 8 (ITL), 6 (TDD); Exp7: 12 (ITL), 11 (TDD); Exp8: 20 (ITL), 20 (TDD); Exp9: 16 (ITL), 18 (TDD); Exp10: 33 (ITL), 38 (TDD); Exp11: 35 (ITL), 28 (TDD); Exp12: 20 (ITL), 21 (TDD).",2 (for all 12 experiments: Iterative Test Last development (ITL) vs TDD),no,N/A,no,N/A,N/A,N/A,no,N/A,yes,Experiments 1-8: 5 hours; Experiments 9-12: 2.5 hours,N/A,N/A,Europe, Endres2021,Training novices,To compare the effect's of technical reading training and spatial skills training on novices' programming skills,controlled longitudinal experiment,"spatial ability (paper folding test, Purdue Spatial Visualisation Test); reading (Graduate Examination ""GRE"" record), programming assessment (Second CS1 ""SCS1"" Assessment)",yes (Cohen's f^2),Between Subject,Computer Science (CS1) students,57 (97),28 (spatial training); 29 (Technical reading training),2 (technical reading training vs. spatial training),yes,$20 for 2-hours sessions ($220 max),no,N/A,N/A,N/A,no,N/A,yes,11 week longitudinal study,$15.94 (https://www.indeed.com/career/paid-intern/salaries/UT),"lower (i.e., $10)",USA, Cates2021,Program comprehension,To investigate the effects of using and naming variables on program comprehension,experiment,"task correctness, task completion time",no,Hybrid,58% of participants had 3 years or more of programming work experience,113 (191 entered),113,3 (compound mathematical expression; sub-expressions with meaningless variable names; sub-expressions with meaningful variable names),no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,online, Kirby2021,microservice extraction techniques to migrate monolithic applications to microservices,To investigate practitioners' expectations for tools utilising such relationship types during microservices extraction,observational study (Think-aloud study),N/A (qualitative study - open coding),N/A,N/A,software developers with more than 5 years of software development experience and more than one year of involvement with a substantially big and complex microservice migration,10,10,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,"Canada, Croatia, Germany, Norway, Australia, and the USA", Shen2021c,API usage patterns,"To test the usefulness of CODA, the tool implementation of the proposed approach to help developers interactively complete API usage patterns",controlled experiment,"user interaction times, users' task completion times, the tool's (CODA) response time",no,Between Subject,4 software developers with minimum 4 years of industrial software development experience; 4 software engineering graduate students,8,4,"2 (control: using CoData, a widely used deep learning based code completion plug-in; treatment: using CODA)",no,N/A,no,N/A,N/A,N/A,no,N/A,no,1867 seconds (~ 31 minutes),N/A,N/A,China, Wiese2021,conjoint and separate conditions as control flow structures in code,To investigate students' use and understanding of separate vs. conjoint conditions,observational study,accuracy on program comprehension tasks,N/A,N/A,"undergraduate Computer Science students recruited from two courses: a ""Data Structures and Algorithms"" course and an introductory Software Engineering course",125 (143 started),125,1,yes,"Software Engineering course students: extra credits on completing the task; ""Data Strcutures and Algorithms"" course students: 5 USD for participation",no,N/A,N/A,N/A,"yes (for ""Data Structures and Algorithms"" students 5USD for participation); no (for ""Software Engineering"" course students)",N/A,no,N/A,$15.94 (https://www.indeed.com/career/paid-intern/salaries/UT),N/A (average completion time of the questions/tasks is not reported),USA, Guerriero2021,Model Driven Engineering for distributed streaming applications,"To evaluate the proposed tool ""StreamGen"" that simplifies the design of distributed streaming application and automatically generates the corresponding code",experimentation on a tool,task completion time (and subjective user evaluation daat),N/A,Within Subject,"participants with *no* previous knowledge of BigData Application and with *no* programming skills (47% make participants; participants' age range: 20-35; participants are from Europe and majority (33%) is from Italy; diverse educational background (e.g., engineering, economics) excluding computer scientists",23,23,1,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,2 hours,N/A,N/A,Europe (marjoity Italy), Ore2021,Type annotations,To measure baseline accuracy and speed for developers making type annotations to previously unseen code,online experiment,Accuracy and time for each type annotation selection,no,Within Subject,"developers (27%, 57% and 19% have less than 1, 1-5 and more than 5 years of development experience in C/C++/C#/Java, respectively.)",97,97,6 (1. No annotation type suggestion; 2. One correct annotation type suggestion; 3. One incorrect annotation type suggestion; 4. Three annotation type suggestions where 1st one is correct; 5. Three annotation type suggestions where second and third are correct; 6. Three annotation type suggestions where none of them are correct),yes,$12 ($2 for pre-test; $10 for the task),no,N/A,N/A,N/A,"no (Crowdsourcing platform is used. Even to get $2 for pre-test, one should complete the pre-test)",N/A,yes,4.5 hours (30 minutes for pre-test and 4 hours for the task),$24 ($10 for ~25 minutes),N/A,online, Paulweber2021a,Object oriented abstractions for Abstract state Machines (ASM),To investigate participants' efffectiveness and efficiency in comprehending an informal specification and expressing this specification in the form of an ASM textual specification using either interface or trait syntax extensions,controlled experiment,task correctness and task completion time,yes (Cliff's Delta),Between Subject,"Bachelor Computer Science students at the University of Vienna, enrolled in ""Software Engineering 2 (SE2)"" course",98,49,2 (Interfaces vs. Traits),yes,up to 6 bonus points for the SE2 course,yes,same,all tasks,linked to individuals,no,CS students: bonus points for their exam admission; Professional developers: 10 Euros; Economy students: 5 Euros; Clickworkers without programming skill: 2.50 Euros,yes,120 minutes,14.04 Euros (hourly wage for a junior software engineer in Austria: https://rollthepay.com/Austria/junior-software-engineer-salary/),N/A,Austria, Sharafi2021,Developers' cognitive activities,"To understand how developers carry out different computer sicence activities (code comprehension, code review, data structure manipulations)",2 controlled experiments,"functional Near Infrared Spectroscopy (fNIRS), functional Magnetic Resonance imaging (fMRI), eye tracking",no,N/A,undergraduate Computer Science students,"recrutied 35 + 36 + 40, valid: 29 + 30 + 40",112 (Code study: 36 participants; Data structures study: 76 partiicpants),1,yes,Monetary compensation and extra course credits,no,N/A,N/A,N/A,N/A (it is clear whether the monetary compensation icludes show up fee),N/A,no,~90 minutes,$35,N/A (amount of monetary compensation is not indicated in the paper),USA, Uddin2021b,Automated API usage documentation from Stackoverflow posts,"To evaluate two proposed algorithms (concept based documentation, statistical documentation) for automated API usage documentation",experiment,"task correctness, time, effort (NASA TLX used to measure effort)",yes (Cliff's Delta),Within Subject,"Professional software developers (18), 13 students from Universities (University of Saskatchewan and Polytechnique Montreal in Canada; and Bangladesh University of Engineering and Technology, and Kulna University in Bangladesh).",31,31,"4 (Stackoverflow ""SO""; Official API documentation ""DO""; proposed solution ""Opiner""; All three techniques, i.e., SO+DO+Opiner )",yes (only software developers; software developers were recruited through Freelancer.com),$20 (for software developers; software developers were recruited through Freelancer.com),no,N/A,N/A,N/A,no,N/A,no,"average times - Opiner: 18.6 minutes; SO: 22.3 minutes; DO: 23.7 minutes, and all three techniques (SO+DO+Opiner): 19.4 minutes",N/A (there is no a single city & country from where the particiopants come),N/A,"online, Canada, Bangladesh", Alhamed2021,Planning Poker practice to estimate effort for large-scale or open-source software projects.,To investigate the feasibility of using crowdsourcing where crowd workers are provided with limited information about a task to obtain accurate effort estimates for open-source/large-scale software projects,online experiments (crowdsourcing),"Magnitude of Relative Error (MRE ), mean of MRE (MMRE) and Median of MRE (MdMRE) (MRE = | (e - e') / e |, where e is the actual effort in ""person-hour"" recorded in the issue tracker and e' is the effort estimate)",no,N/A,participants with self-declared software engineering experience of at least two years,"807 estiamtes (i.e., an estiamte by one participant)",N/A (There are 30 effort estimation crodwsourcing tasks in Table V.),N/A,yes,$1.99 (average),no,N/A,N/A,N/A,no,N/A,no,min: 17 minutes; max: 76 minutes,N/A,N/A,online, Braz2021a,Software vulnerabilities,To investigate to what extent developers can detect Improper Input Validation (IIV) and the underlying causes,online experiment,"task correctness (i.e., vulnerability found/not found)",yes (odds ratio),Within Subject,"57% of the participants are software developers and reported to have multiple years of experience in professional software development. 23% have 3-5 years of experience, 32% have 6-10 years, and 18% have more than 11 years. Most respondents design, program, and review code daily (6190, 96%, and 64%, respectively) - involved a few students and researchers",146,"194, 146 valid",2 (control: no warning about vulnerabilities; treatment: warning about vulnerabilities),yes,$5 (donation to a charity),no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,online, Danilova2021,"Preventing fraud while recruiting participants with programming skills to conduct research studies (e.g., surveys) that use online platforms","To evaluate the efficacy and efficiency of a questionnaire proposed in the paper. The proposed questionnaire consists of screening questions (e.g., on program comprehension, programming language recognition, algorithmic runtime, etc) to distinguish research study participants with programming skills from those without programming skills.",observational study,number of correctly answered questions in the questionnaire; time to complete the questionnaire,no,N/A,"CS Students (17); Professional developers (33); Economy students (50); Participants recruited through the online platform Clickworker, i.e., Clickworkers without programming skill (97); Clickworkers with programming skill (52)",249,249,1,yes,CS students: bonus points for their exam admission; Professional developers: 10 Euros; Economy students: 5 Euros; Clickworkers without programming skill: 2.50 Euros,no,N/A,N/A,N/A,yes (for Economy students and Clickworkers without programming skills); not clear (for CS students and professional software developers) since all participants in these groups passed the attention check.,N/A,no,median times for non-programmers: 7.87 minutes and for programmers: 10.87 minutes,40 Euros (Hourly wage for professional software developers in Germany (https://www.salaryexpert.com/salary/job/software-developer/germany) & Austria is approx) 12 Euros (Minimum hourly wage in Germany) (https://en.wikipedia.org/wiki/Minimum_wage_in_Germany).,"""higher"" than hourly wage for professional software developers (10 Euros is paid for around 10.87 minutes, which is equivalent to ~55 Euros. Average hourly pay for software developers in Germany and Austria is approximately 40 euros.); ""higher"" for economy students (5 Euros for around 8.87 minutes is equal to approx. 33 Euros); ""approximately equal"" to the mimimum wage for Clickworkers without programming skills; ""N/A"" for some Clickworkers since these participants' countries are not reported in the paper.","N/A for some participants (There are 50 participants whose country is reported as ""other."" The remaining participants reported to be from Germany, Austria, UK, USA, Spain, Italy)", Endres2021a,Cognitive processes underlying novice programmers,To understand how novice programmers reason about coding at neurological level,experiment,functional Near Infrared Spectroscopy (fNIRS),no,Within Subject,"CS students enrolled in an introductory programming course at the University of Michigan (24 female, 7 male) ranged in age from 18 to 21 years old .",31,31,"3 (code comprehension, mental rotation, prose reading)","yes (only 20 participants who participated the post-test. Post-test is a multiple-choice test that covers Boolean logic, while loops, for loops, arrays, if statements, functions, and recursion.) while loops, for loops, arrays, if statements, functions, and recursion.)",$20 (for fNIRS scan) and additional $20 for those who participated the post-test),no,N/A,N/A,N/A,no,N/A,yes,"1.5 hours (fNIRS), 1 hour (post-test)",$27.30 (the average hourly wage fo an intern programmer in Michigan: https://www.indeed.com/career/software-development-intern/salaries/MI),"no, lower (a participant who completes fNIRS measurement and the posttest gets $40 for 2.5 hours, which is $16, lower than the hourly wage for an intern programmer)",USA, Foundjem2021b,Onboarding of new contributors to Open Source Software Ecosystems,"To provide a catalog of teaching content, teaching strategies, onboarding challenges, and expected benefits.",observational study,N/A (qualitative study - a combination of inductive and deductive coding),N/A,N/A,"participants who has good programming skills in at least Python, have formal college/university education in Computer Science. Participants' average age 20-30 years old",72,N/A,N/A,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,"$39.74 (https://ca.talent.com/salary?job=software+developer#:~:text=Find%20out%20what%20the%20average%20Software%20Developer%20salary%20is&text=The%20average%20software%20developer%20salary%20in%20Canada%20is,year%20or%20%2439.74%20per%20hour.)",N/A,Germany (observed in-boarding event in Berlin), Hallett2021,Secure coding practices,To investigate if writing specification of how the code should behave before implementation improves developers' secure coding practices,experiment,"A score is given to the code each participant implemented. The highest score is 7. The score is calculated using the checklist provided in the following study by Naiakshina et al.: Why Do Developers Get Password Storage Wrong? A Qualitative Usability Study, CCS '17.",yes (rank biserial coefficient),Between Subject,software developers,138,69,2 (control: coding without writing any specification; treatment: writing a specification of how the code shold behave before implementing),yes,5 GBP,no,N/A,N/A,N/A,N/A,N/A,no,N/A,N/A,"N/A (Authors indicate that 5 GBP is a living wage in the UK. However it is lower than a developer's hourly wage in the UK. Moreover, participants can be from any country since the experient was conducted online. However, 5 GBP is higher than the hourly wage of a software developer for instance in Turkey (45TRY~ 2 GBP). Therefore, it is not possible to estimate if the amount is realistic.)",online, Peitek2021a,Program comprehension and code complexity metrics,To investigate whether and how code complexity metrics reflect difficulty in program comprehension,observational study,"task correctness, task completion time, fMRI (functional Magnetic Resonance Imaging)",no,N/A,"late undergraduate and graduate students at the university of Magdeburg (2 female, 17 male, 26.47 ± 4 2.68 years. The participants are ""intermediate programmers"" according to Dreyfus' taxonomy of skill acquisition. old).",19 (18 valid),19 (18),1,yes,money,no,N/A,N/A,N/A,N/A,N/A,yes,45 minutes,39.92 Euros (hourly wage for an average software developer in Germany: https://www.salaryexpert.com/salary/job/software-developer/germany),N/A,Germany, Wyrich2021,Program comprehension and code comprehensibility metrics,"To investigate to what extent a displayed metric value for code compehensibility affects developers' subjective code comprehensibility, whether the program comprehension performance is affected by this anchoring effect and the individual characteristics that might play a role in the anchoring effect.","experiment (randomized, double blind)","subjective code comprehensibility, task correctness and time to complete the task",yes,Between Subject,"MSc students in Software Engineering (41 male, 2 female); average declared Java programming expeirence is 5.83 years and mean age is 24.47 years old.",45 (43 valid),"20 (""easy"" group); 23 (""hard"" group)","2 (treatment1: the code comprehensibility metric indicates that the code is ""easy""; treatment2: the code comprehensibility metric indicates that the code is ""hard"")",no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,39.92 Euros (hourly wage for an average software developer in Germany: https://www.salaryexpert.com/salary/job/software-developer/germany),N/A,Germany, Caulo2021,Personality traits and software development productivity,To investigate how personality traits affect software development productivity in the context of distributed development of multiplatform apps,observational/field study,"Personality traits using the NEO Personality Inventory; Lines of Code (LOC) per time; Halstead's difficultyper time, Code owned per time; Number of commits per time; Number of ocmmitted lines by time; Number of committed characters by time; relative contribution percentage (i.e., number of commits a participant makes is divided by the total number of commits made by the participant's project group)",yes (Cliff's delta),N/A,Computer Science masters students at the University of Salerno,31,31,1,no (Students voluntarily took part in the study and they were not evaluated for their participation),N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,28 Euros (average hourly wage for a software devleoper in Italy: https://www.salaryexpert.com/salary/job/software-developer/italy),N/A,Italy, Joergensen2021a,Skill indicators and software developers' programming skills,To investigate: (1) the relationship between software developers' programming skills and their effort estimates; (2) how well company and self-reported skill indicators correspond to developers' actual programming skills,observational study,programming skill (using the polytomous Rasch model); developers' self reported skill indicators and effort estimates,"yes (correlation coefficients, which can also be a measure of ""effect size"" : https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/effect-size/#:~:text=In%20statistics%20analysis%2C%20the%20effect,%2C%20(3)%20correlation%20coefficient.)",N/A,"professional software developers (7 female out of 104 developers) with minimum half a year of experience in Java development; 27% junior developers, 43% intermediate developers and 30% senior developers.",104,104,1,yes,Exact amounts are not given in the paper. However the participants were paid hourly wages equal to what they ware paid at their companies.,no,N/A,N/A,N/A,no (The paper does not report any participant out of 104 develoeprs recruited who did not complete the tasks),N/A,no,4-5 hours,$16.59 (average software developer hourly wage in Poland); $15-$50 (software developer hourly wage range in Ukraine),"yes (in the paper, it explicitly writes that the developers were paid their ordinary hourly wages)","Poland, Ukraine (are the countries from where the developers were recruited)", Karac2021,Test Driven Development (TDD),To investigate the impact of task description granularity on TDD outcome with respect to software quality.,experiment (cross-over design),"functional correctness, functional completeness",yes,Within Subject,graduate students at the universioty of Oulu,48 (52 started),48,2 (control: task description; treatment: more granular task description) ,no,N/A,no,N/A,N/A,N/A,no,N/A,yes,"2 hours for each task (2 periods in total, one task in each period)",,N/A,Finland, Melo2021,Using Docker to reproduce Stackoverflow posts,To evaluate if Docker can help improve reproducibility in Stack Overflow,"3 observational studies (First observational study, which answers RQ2, is conducted with students to assess the difficulty of Dockerizing Stackoverflow posts. Second observational study is conducted with two software developers to understand how often developers can Dockerize Stackoverlfow posts. Third observational/user study was conducted to evaluate the tool proptotype FRISK)",,no,N/A,First observational Study (to answer RQ2): 8 students from a grad-level software testing course; Second observational study (to answer RQ4): 2 software developers with 3 years of web based development experience; Third observational/user study: Stackover flow users (number of participants not reported: Access to the tool was made available to Stackoverflow users.,10 (for the first and second observational studies); not reported for the user study,10 (for the frist and second observational studies); not reported for the user study,1 (for each observational study),"no (for the User Study, financial incentives may not be the appropriate solution sicne particpants are using the tool ""FRISK"" to solve problems related to their dialiy development activities.)",N/A,no,N/A,N/A,N/A,no,N/A,First observational study (to answer RQ2): yes; Second observational study (to answer RQ4): no; User study: no ,First observational study (to answer RQ2): 90 minutes; Second observational study (to answer RQ4): not reproted; User study: N/A ,Average hourly wage for a software developer in Brazil is 51 BRL = $9.34 (http://www.salaryexplorer.com/salary-survey.php?loc=30&loctype=1&job=783&jobtype=3),N/A,Brazil, Meyer2021a,Improvement of developers' work habits to increase productivity and well-being,To learn about developers' goals and strategies to improve or maintain good habits at work,observational study,N/A (qualitative study - a combination of inductive and deductive coding),N/A,N/A,"Professional software developers (7 female, 45 male) with an average 8.2 years of professional software development",52 (started with 59),52,1,yes,$50 Amazon gift card,no,N/A,N/A,N/A,no,N/A,no,N/A (Participants answer some reflection questions in the morning and afternoon during a 2-3 weeks longitudinal study),N/A (Hourly wage amount changes depending on the city in USA or Canada),N/A,"USA, Canada, Switzerland, Brazil", Mohanani2021,Design creativity in software projects,"To investigate the impact of presenting desiderata as ideas, requirements or prioritised requirements on design creativity.",controlled experiment,"practicality, originality (of the designs the participants created)",yes (Cliff's delta),Between Subject,"EXPERIMENT1: Management and engineering postgraduate students at the Lancaster University (19 females, 23 males with mean age of 25 years; 14 students had software engineering background); EXPERIMENT2: postgraduate students enrolled in the Information Processing Program of university of Oulu (7 female, 27 male with average age of 26 years). All students had at least 1 year of experience in software development.",76 (EXPERIMENT1: 42 & EXPERIMENT2: 34),(EXPERIMENT1: 42 & EXPERIMENT2: 34),"2 (EXPERIMENT1: In ""control group"" desiderata are called ""ideas,"" an in ""treatment group"" desiderata are called ""requirements."" EXPERIMENT2: In ""control group"" desiderata are called ""ideas,"" an in ""treatment group"" desiderata are ""prioritised requirements."") ",yes,EXPERIMENT1: lunch coupon; second experiment: extra credit as a part of their course work ,no,N/A,N/A,N/A,no,N/A,yes,60 minutes for each experiment,,,"EXPERIMENT1: United Kingdom, EXPERIMENT2: Finland", Muntean2021,Automatic repoair of integer overflows in C programs,"To evaluate the propposed technique INTREPAIR, which automatically repair integer overflows in C programs",controlled experiment,time to locate each fault and repair it; success rate for each analysed program,no,Between Subject,"graduate students (18 males, 12 females) with C/C++ programming experience and with on average 1-3 years of industrial programming experience",30,15,2 (control: without INTREPAIR; treatment: with INTREPAIR),no,N/A,no,N/A,N/A,N/A,no,N/A,no,"average times, control group: 64 minutes; treatment group: 5.8 minutes",N/A,N/A,"N/A (The country of the participants is not indicated in the paper. The authors are rom Switzerland, China, Germany and Sweden)", Panach2021,Model driven software development,To study the impact of problem complexity on software quality within the context of model driven development (MDD),experiments (6 replications),"Percentage of passed tests, accuracy, developer effort, developer productivity, developer satisfaction",yes (Cohen's d),Within Subject,Computer Science students,78 individuals = 39 pairs (1st Replication: 7 pairs; 2nd Replication: 3 pairs; 3rd Replication: 7 pairs; 4th Replication: 10 pairs; 5th Replication: 6 pairs; 6th Replication: 6 pairs),78 individuals = 39 pairs (1st Replication: 7 pairs; 2nd Replication: 3 pairs; 3rd Replication: 7 pairs; 4th Replication: 10 pairs; 5th Replication: 6 pairs; 6th Replication: 6 pairs),2 (control: traditional software development; treatment: MDD),no,N/A,no,N/A,N/A,N/A,no,N/A,yes,4 hours (each replication),,N/A,Chile (3 identical replications); Spain (3 replications where experiment objects were different), Scalabrino2021a,Code understandability,"To assess if code-, documentation-, and developer-related metrics capture code understandability",observational study,"perceived code understandavbility, actual code understandability, time spent on each code snippet",yes (Kendall's Tau),N/A,"Java developers (13), bachelor Computer Science (CS) students (38), masters' CS students (9), PhD Cs students (3)",63,63,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,"~20 minutes (it took 2.5 minutes to understand each code snippet on average, each participant were shown 8 code snippets)",N/A,N/A,online, Scoccia2021,Trustability issues of mobile apps due to confining end-users to select between privacy and functionality,To evaluate the usability and acceptance of the proposed approach which aims to improve the trustability of Android apps,observational/user study + experiment,Time to complete the feature component mappings of a web-based app used in the evaluation,no,N/A + Within Subject,Android App developers with average of 3.45 years of Android develvopment experience + students/general population of app users,11 + 47,11 + 47,1 + 2,no,N/A,no,N/A,N/A,N/A,no,N/A,no,average 8.03 minutes (to create the mappings),N/A,N/A,online, Tosun2021,Test Driven Development (TDD),To estimate the impact of the chosen task and development approach,experiment,Number of passing assertions over all assertions in all acceptance tests,yes (Cohen's d),Within Subject,software developers,17 (18 registered),17,"2 (control: incremental text last development, treatment: TDD)",no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,"likely Estonia, since the last author was aorking for a company in Estonia (The experiments participants are developers of a company, but the country of the company is not stated in the paper)", Ampatzoglou2021,Architectural decisions,To evaluate the ease of use of the tool implementation of the proposed cost-benefit approach for architectural decisions,observational study,N/A (qualitative study),no,N/A,"professional software developers (2 project maangers, 2 software/system architects)",4,4,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Sweden, Saputri2021,Sustainability of software intensive systems,"To evaluate the feasibility of ""INSURE,"" the proposed framework which combines goal based approach, scenario based approach and feature modelling to gather sustainability requirements and corresponding features.",quasi experiments (feasibility study);,"number of identified stakeholders, number of identified requirements and ratio of solved requirements conflicts during ""requirements elicitation and analysis""; number of identified features during ""feature analysis""; captured sustainability achievement and evaluation coverage during ""sustainability evaluation""; time spent on each of the tasks (i.e., ""requirements elicitation and analysis,"" ""feature analysis,"" and ""sustainability evaluation"").",no,feasibility study: Within Subject; replicated study: Between subject,"feasibility study: 10 Ph.D. candidates and post-doctoral fellows in Computer Engineering and Energy Engineering Departments at Ajou University with at least 3 years of software engineering experience; replicated study: 20 participants from industry (e.g, automative, financial, power & energy and IT consultancy) with at least 5 years of software development experience. ",30 (feasibility study: 10 participant from academia; replicated study: 20 participants from industry),feasibility study: 10; replicated study: 10,"2 (control: without INSURE (i.e., the proposed approach); treatment: with INSURE)",no,N/A,no,N/A,N/A,N/A,no,N/A,no,"feasibility study: 81 minutes (control group with health care subject system), 85.6 minutes (control group with smart home subject sytem); 37.6 minutes (treatment group with health care subject system), 38.4 minutes (treatment group with smart home subject sytem); replication study: 70 minutes (control group with health care subject system), 63.8 minutes (control group with smart home subject sytem); 37.8 minutes (treatment group with health care subject system), 33.4 minutes (treatment group with smart home subject sytem).",N/A,N/A,South Korea, Echeverria2021,"Using Clone and Own (CaO), and Software Product Lines (SPL) in software development","To determine the effectiveness, efficiency, and satisfaction of software engineers when using SPL approach versus CaO approach.",experiment,"Percentage of experimental tasks performed correctly (i.e., effectiveness); The time spent (in minutes) to perform the task",yes (Partial Eta Squared),Between Subject,software engineers with an average experience of 5.4 years. 5 participants daily work with CaO and 5 subjects daily work with SPL,10,5,2 (control: CaO; treatment: SPL),no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,Spain, Ahrens2021,Helping developers find relevant information and navigate in the requirements specficiation documents more efficiently,"To evaluate if attention visualisations based on eye tracking data can positively affect the roles of software architect, UI designer and tester when reading a requirements specification for the first time",experiment,"For all participants: The number of times the quick access buttons were used, time spent on scrolling, total amount of page switches, total amount of time spent on pages accessed using the quick access button, average time spent on each page, total dwell time on each page, average time spent per page; For software architects: number of classes, attributes, methods in the class diagram; For UI designer: number of relevant UI elements; For tester: number of test cases",no,Between Subject,"students of the course teaching software quality principles (23 male, 6 female), 3 of them are graduate electrical engineering and information technology students and the rest are bachelor ocmputer science students",29,control: 15 participants; treatment: 14 participants,2 (control: no attentiaon tranfer features; treatment: attention transfer feature),yes,bonus point for the final exam of the course that teaches software quality principles,no,N/A,N/A,N/A,"yes (As indicated in section 4.7, participants get the bonus for ""participation"" and they could quit the experiment without experiencing any disadvantage)",N/A,yes,40 minutes,N/A,N/A,Germany, Dalpiaz2021,Textual notations and techniques used in requirements engineering,To investigate the adequecy of use cases and user stories for the manual derivation of a a structural conceptual mdoel representing the system's domain.,a controlled experiment and a quasi-experiment,validity and completeness of the solution,yes (Hedge's g),controlled experiment: Within subject; quasi experiment: between subject,controlled experiment: third year students taking the Object oriented Analysis and Design course at the University of Negev; quasi-experiment: master's students taking the course on Requirements Engineering at Utrecht University,"controlled experiment: 118; quasi-experiment: 24 (32, but only 24 allowed to share data)",controlled experiment: 118; quasi-experiment: 11 (teatment1: User Stories); 13 (treatment2: Use Cases),2 (treatment1: User Stories; treatment2: Use Cases),"controlled experiment: yes; quasi-experiment: no (bit unclear phrasing though, could also be course credits)",controlled experiment: bonus points,no,N/A,N/A,N/A,no,N/A,"controlled experiment: 2 hours; quasi-experiment: no, as the study is spread ober 8 weeks and the students completed the tasks as assignments","controlled experiment: 2 hours; quasi-experiment: N/A, as the study is spread ober 8 weeks and the students completed the tasks as assignments",N/A,N/A,controlled experiment: Israel; quasi-experiment: The Netherlands, Joergensen2021,Outsourced software development,To test the following hypotheses: (i) The use of trialsourcing improves the selection of skilled software developers; (ii) the use soft contract based hourly payment leads to better software project outcomes than fixed price contracts.,controlled field experiment,actual hours spent for the task; lines of code written; cyclomatic complexity of code; and number of code smells,no,trialsourcing study: Within subject; payment study: Between subject,"software freelancers with 6.9 years of experience, experience with Javascript and >95% success rate",trialsourcing study: 36; payment study: 16 ,"trialsourcing study: 36; payment study: treatment 1 (hourly payment): 8, teatment2 (fixed payment): 8",trialsourcing study: 1; payment study: 2 (treatment1: fixed payment; treatment2: hourly payment),yes (contracts),$180--4000,no,Different,N/A,N/A,no,N/A,trialsourcing study: yes;,trialsourcing study: 2 hours,,N/A,Norway, Lavalle2021,The visualisation of user requirements,To evaluate the effectiveness of the proposed methodology that collects user requirements and semi-automatically creates suitable visualisations,experiments,number of analytical questions answered; time required to complete the experiment,no,Within Subject,2nd year computer engineering students (84) and employees of a technological company (13),97,97,2 (control: without using the methodology; treatment: with using the methodology),yes (for students); no (employees of the IT company),students: 0.25 bonus points out of 10 points in the final mark,no,N/A,N/A,N/A,no,N/A,no,45-60 minutes,N/A,N/A,Spain, Kifetew2021,User feedback analysis for requirements prioritisation,"To evaluate the effectiveness of ReFeed, which is the porposed approach to extract quantifiable properties relevant to prioritising the requirements",2 experiments,"EXPERIMENT1: ranks of requirements by participants, EXPERIMENT2: differerence in the ranks assigned by participants and by ReFeed",yes (Kendall's Tau),EXPERIMENT1: Between Subject; EXPERIMENT2: Within subject,EXPERIMENT1: graduate students from university of Hannover (29); EXPERIMENT2: graduate students at the University of Trento (11),40 (EXPERIMENT1: 29; EXPERIMENT2: 11),EXPERIMENT1: control: 13; treatment: 16; EXPERIMENT2: control: 6; treatment: 5,2 (control: randomly ordered requirements; treatment: requirements prioritised using the proopsed apporach ReFeed),no,N/A,no,N/A,N/A,N/A,no,N/A,no,"EXPERIMENT1: control (requirements in random order): 23.20 minutes, treatment (requirements sorted by ReFeed): 23.55 minutes; second experiment: control (requirements in random order): 26.05 minutes, treatment (requirements sorted by ReFeed): 24.28 minutes",N/A,N/A,"EXPERIMENT1: Germany, EXPERIMENT2: Italy", Kuttal2021,Using developers' online contributions for recruitment,To identify what information is used and information is collected from peer productions sites such as GitHub and Stackoverflow for recruitment purposes.,experiment,"number of clicks for various features (e.g., GitHub profiles, Repository list, Stackoverflow profile, personal website)",no,Within Subject,"Individuals with experience in recruitment (i.e., hiring and interviewing) process who work in corporate and small companies",10,10,"2 (control: without using the proposed approach ""VisualResume,"" participants use GitHub, Stackoverflow and other online resources; treatment: using ""Visual Resume,"" and any other online resources (e.g., Google search, personal blogs, etc.) ",no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A,N/A,N/A,"N/A (The countries of the participants are not indicated in the paper. Also, the authors come from different countries (e.g., USA, New Zealand) and different states in the USA (Ohio, Oklohoma), where hourly wages can very.", Aghayi2021,Crowdsourced programming through a Behaviour Driven microtask workflow,"To evaluate the feasibility of crowdworkers to make contributions through a behavior-driven microtask workflow, the time it takes for onboarding and to make the first contribution and the feasibility of implementing and testing a microservice through microtasks",observational study,"number of implemented functions, number of created unit tests, number of lines of code in each implemented fucntion",no,N/A,"1 undergraduate student in computer science or a related field, one instructor, and seven graduate students in computer science or a related field. Two of the participants have less than 6 months of Javascript experience, 3 participants have 7-12 month Javascript experience, 4 participants have more than 4 years of Javascript experience.",9,9,1,yes,$20 gift card,no,N/A,N/A,N/A,no,N/A,yes,first session: 150 minutes; second session: 120 minutes,N/A,"N/A (The participants' countries are given in the paper. However, the paper does not explicitly indicate which participant comes from which country. Moreover, participants' Javascript experience ranges from less than 6 months to more than 4 years. For instance, P1 has less than 6 months of Javascript experience. We also do not know participant P1's overall software development experience. Hence, we cannot know if we should categorize P1 as junior or senior developer, and seniority in software development affects the hourly wage. Moreover, we do not know which country P1 comes from. Hourly wage a junior/senior software developer gets varies from one country to another. The hourly wage varies also across states in a country (e.g., USA), and we do not know from which state in the USA the participant(s) come(s).)","USA, Spain, England, India", Addazi2021,A framework for UML modelling that uses blended textual and graphical notations.,"To investigate if mpdelling time by using blended modelling features is shorter than modelling time using single notation (i.e., either graphical or textual)",experiment,modelling time using single notation; modelling time using blended notations,no,"Within Subject (see subsection ""Tasks"" on page 11, the second column)",academic researchers (10) with experience in industrial projects and industrial researchers (4). Partiicpants had minimum 3 years of experience with software design and development ans minimum 2 years of experience in projects with UML-based software design in Eclipse/Papyrus.,14 (18 recruited),14,2 (textual vs. graphical),no,N/A,no,N/A,N/A,N/A,no,N/A,no,245-415 seconds (mean values),N/A,N/A,Sweden, Alanazi2021,"Facilitating program comprehension using static call graphs at different granularity levels (e.g., packages, classes, functions).",To validate the proposed approach to faciltate program coprehension using call graphs at different granularity levels,observational study,"N/A (the paper does not report any results on task correctness or task completion time, but only reports the outcome of the usefulness questionaanire and System Usability Score survey, which are participants' subjective evaluations).",no,N/A,software engineers with at least 3 years of experience belonging to international industried including Apple and Google.,18,18,1,no,N/A,no,N/A,N/A,N/A,no,N/A,no,N/A (not reported in the paper),N/A,N/A,"online (The study was probably conducted online, which implies that participants could be from any country. One reason why the study was probably conducted online is that the authors made the tool accessible to the participants by installing it on Azure cloud computin service. Moreover, the participants answered the usefulness questionnaire and SUS using Google Forms. On the other hand, if we assume that the study was conducted face-to-face, the paper does not explicitly indicate in which country the study was conducted, and the authors' afficiliations are in two different countries: USA and Saudi Arabia.)", Baldassarre2021,Test Driven Development (TDD),To investigate the effect of TDD as compared to non-TDD approaches and the retainment of TDD over a time span of six months,longitudinal experiment,"number of tackled features, number of asserts passed in the test suites, number of unit tests participants wrote, number of mutants the test killed, number of cycles classified as test-first, duration of development cycle, number of cycles classified as refactoring when a participant followed TDD approach.",no,Within Subject,Third year undergraduate students in Computer Science at the University of Bari,1st and 2nd experimental sessions: 39; 3rd and 4th experimental sessions: 30,1st and 2nd experimental sessions: 39; 3rd and 4th experimental sessions: 30,2 (TDD versus no TDD),yes,bonus in the final mark of the Integration and Testing course,no,N/A,N/A,N/A,no,N/A,no,N/A (longitudinal study),N/A,N/A,Italy, Paulweber2021,Advanced language constructs and state based formal methods,"To investigate the advanced language contructs which offer interfaces, mixins and traits in the context of stated based formal methods using Abstract State Machines (ASMs).",controlled experiment,task correctness and duration,yes (Cliff's delta),Between subject,Computer Science bachelor and MSc students at the University of Vienna,105,Treatment1: 36; Treatment2: 34; Treatment3: 35,3 (Treatment1: Interfaces; Treatment2: Mixins; Treatment3: Traits,yes,up to 6 bonus points for the DSE or ASE courses (correctness of the tasks determined how much a participant got out of 6 points),no,N/A,N/A,N/A,no,N/A,yes,105 minutes,N/A,N/A,Austria, Blanco2021,Cross platform software development,"To evaluate a proposed approach, which is a general-purpose language based approach that makes the software platform-independent",observational study,time spent for learning; time spent on each task,no,N/A,"expert evaluation: experts with minimum 10 years of experience in web and mobile technologies; novice evaluation: undergraduate students enrolled in the System Analysis and Development course at the Federal Institute of Sao Paulo. Students had basic programming knowledge and minor experience in IDEs, but they had not studied mobile and web development.",expert evaluation: 5; novice evaluation: 4,expert evaluation: 5; novice evaluation: 4,1,no,N/A,no,N/A,N/A,N/A,N/A,N/A,no,"(average times) expert evaluation: Learning time: 6 hours 24 minutes, time spents on tasks: 7 hours 46 minutes; novice evaluation: learning time: 9 hours 15 minutes, time spent on tasks: 7 hours 50 minutes",N/A,N/A,Brazil, Taipalus2021,Compiler error messages in Database Management Systems (DBMS),"To investigate how participants experienced the qualities of error messages of four popular DBMSs in terms of error message effectiveness, perceived usefulness for finding and fixing errors, and error recovery confidence.",experiment,success rate for fixing SQL errors,yes (eta squared),Between Subject,"second, third and fourth year bachelor students from software engineering, computer science and information systems fields",152,MySQL: 41; Oracle Database: 36; PostgreSQL: 25; SQL Server: 40,"4 (MySQL, Oracle Database, PostgreSQL, SQL server)",yes,course points towards a better point,no,N/A,N/A,N/A,"yes (See page 4, first paragraph in section ""3.3. Data collection"" -- All potential participants are given course points towards a better course grade regardless of participating in the study or not)",N/A,no,N/A,N/A,N/A,Finland, Liu2021m,Code reviews with formal specification.,To demonstrate that the specification technique works compared to a checklist approach.,experiment,finding bugs,no,Hybrid,students,10,10,2,yes (contracts),money,N/A,N/A,N/A,N/A,N/A,N/A,yes,90 mins,N/A,N/A,China,