Dataset Open Access
Dataset of the research paper: Software Engineering Education Knowledge versus Industrial Needs
Contribution: Determine and analyze the gap between software practitioners’ education outlined in the 2014 IEEE/ACM Software Engineering Education Knowledge (SEEK) and industrial needs pointed by Wikipedia articles referenced in Stack Overflow (SO) posts.
Background: Previous work has uncovered deficiencies in the coverage of computer fundamentals, people skills, software processes, and human-computer interaction, suggesting rebalancing.
Research Questions: 1) To what extent are developers’ needs, in terms of Wikipedia articles referenced in SO posts, covered by the SEEK knowledge units? 2) How does the popularity of Wikipedia articles relate to their SEEK coverage? 3) What areas of computing knowledge can be better covered by the SEEK knowledge units? 4) Why are Wikipedia articles covered by the SEEK knowledge units cited on SO?
Methodology: Wikipedia articles were systematically collected from SO posts. The most cited were manually mapped to the SEEK knowledge units, assessed according to their degree of coverage. Articles insufficiently covered by the SEEK were classified by hand using the 2012 ACM Computing Classification System. A sample of posts referencing sufficiently covered articles was manually analyzed. A survey was conducted on software practitioners to validate the study findings.
Findings: SEEK appears to cover sufficiently computer science fundamentals, software design and mathematical concepts, but less so areas like the World Wide Web, software engineering components, and computer graphics. Developers seek advice, best practices and explanations about software topics, and code review assistance. Future SEEK models and the computing education could dive deeper in information systems, design, testing, security, and soft skills.
The following data files are included.
posts_analysis.csv: Stack Overflow post data and metadata.
posts_aggregated_codes.csv: The aggregated codes that resulted from the manual analysis of the Stack Overflow posts by grouping individual keywords assigned to the posts.
survey_questionnaire.csv: The final survey questionnaire.
survey_responses.csv: Anonymized responses of the final survey questionnaire. (E-mail addresses have been excluded for privacy reasons.)