Dataset - MLinter: Learning Coding Practices from Examples—Dream or Reality?

Latappy Corentin; Perez Quentin; Degueule Thomas; Falleri Jean-Rémy; Urtado Christelle; Vauttier Sylvain; Blanc Xavier; Teyton Cédric

doi:10.5281/zenodo.7341456

Published November 20, 2022 | Version 1.0.0

Dataset Open

Dataset - MLinter: Learning Coding Practices from Examples—Dream or Reality?

1. Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Promyze
2. EuroMov Digital Health in Motion, Univ. Montpellier & IMT Mines Ales
3. Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI
4. Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, IUF
5. Promyze

Contains all artefacts generated to write our paper: MLinter: Learning Coding Practices from Examples—Dream or Reality?

Contents:

output: Datasets and all needed files used to create them
results: CSV files generated by our MLinter evaluation for each rule
analysis: Pictures and files generated to interpret our results

Paper abstract:
Coding practices are increasingly used by software companies. Their use promotes consistency, readability, and maintainability, which contribute to software quality. Coding practices were initially enforced by general-purpose linters, but companies now tend to design and adopt their own company- specific practices. However, these company-specific practices are often not automated, making it challenging to ensure they are shared and used by developers. Converting these practices into linter rules is a complex task that requires extensive static analysis and language engineering expertise.
In this paper, we seek to answer the following question: can coding practices be learned automatically from examples manually tagged by developers? We conduct a feasibility study using CodeBERT, a state-of-the-art machine learning approach, to learn linter rules. Our results show that, although the resulting classifiers reach high precision and recall scores when evaluated on balanced synthetic datasets, their application on real-world, unbalanced codebases, while maintaining excellent recall, suffers from a severe drop in precision that hinders their usability.

Files

analysis.zip

Files (5.3 GB)

Name	Size	Download all
analysis.zip md5:d5d48c813a2eb2e65329e4e0488a7281	170.8 kB	Preview Download
output.zip md5:e7ef6a925369f5f6a53564039dedc842	5.3 GB	Preview Download
results.zip md5:6c7f92ba34216623964d61589682cc37	986.3 kB	Preview Download

	All versions	This version
Views	195	187
Downloads	51	51
Data volume	121.1 GB	121.1 GB

Dataset - MLinter: Learning Coding Practices from Examples—Dream or Reality?

Creators

Description

Files

analysis.zip

Files (5.3 GB)