Published July 4, 2021 | Version 2.0
Dataset Open

Software Engineering Education Knowledge versus Industrial Needs

  • 1. Athens University of Economics and Business

Description

Dataset of the research paper: Software Engineering Education Knowledge versus Industrial Needs

Contribution: Determine and analyze the gap between software practitioners’ education outlined in the 2014 IEEE/ACM Software Engineering Education Knowledge (SEEK) and industrial needs pointed by Wikipedia articles referenced in Stack Overflow (SO) posts.
Background: Previous work has uncovered deficiencies in the coverage of computer fundamentals, people skills, software processes, and human-computer interaction, suggesting rebalancing.
Research Questions: 1) To what extent are developers’ needs, in terms of Wikipedia articles referenced in SO posts, covered by the SEEK knowledge units? 2) How does the popularity of Wikipedia articles relate to their SEEK coverage? 3) What areas of computing knowledge can be better covered by the SEEK knowledge units? 4) Why are Wikipedia articles covered by the SEEK knowledge units cited on SO?
Methodology: Wikipedia articles were systematically collected from SO posts. The most cited were manually mapped to the SEEK knowledge units, assessed according to their degree of coverage. Articles insufficiently covered by the SEEK were classified by hand using the 2012 ACM Computing Classification System. A sample of posts referencing sufficiently covered articles was manually analyzed. A survey was conducted on software practitioners to validate the study findings.
Findings: SEEK appears to cover sufficiently computer science fundamentals, software design and mathematical concepts, but less so areas like the World Wide Web, software engineering components, and computer graphics. Developers seek advice, best practices and explanations about software topics, and code review assistance. Future SEEK models and the computing education could dive deeper in information systems, design, testing, security, and soft skills.

The following data files are included.

  • wikipedia_articles.csv: Wikipedia articles mapped to the knowledge units of the 2014 IEEE/ACM Software Engineering Education Knowledge (SEEK) and the first and second level categories of the 2012 ACM Computing Classification System (CCS).
  • posts_analysis.csv: Stack Overflow post data and metadata.

  • posts_aggregated_codes.csv: The aggregated codes that resulted from the manual analysis of the Stack Overflow posts by grouping individual keywords assigned to the posts.

  • survey_questionnaire.csv: The final survey questionnaire.

  • survey_responses.csv: Anonymized responses of the final survey questionnaire. (E-mail addresses have been excluded for privacy reasons.)

Files

wiki-so-posts.zip

Files (582.9 kB)

Name Size Download all
md5:b13a44e9a0c2845a9d43acaa9250dcce
582.9 kB Preview Download

Additional details

Related works

Is compiled by
Software: 10.5281/zenodo.4251099 (DOI)
Is documented by
Journal article: 10.1109/TE.2021.3123889 (DOI)

Funding

FASTEN – Fine-Grained Analysis of Software Ecosystems as Networks 825328
European Commission