Investigating Misunderstanding Code Patterns in C Open-Source Software Projects

Flávio Medeiros, Gabriel Lima, Guilherme Amaral, Sven Apel, Christian Kästner, Márcio Ribeiro, and Rohit Gheyi

Overview

Maintenance consumes 40% to 80% of software development costs. So, it is essential to write source code that is easy to understand to reduce costs during maintenance tasks. Code understanding is further important because developers often mistake the meaning of code, and misjudge the program behavior, which can lead to errors. There are some patterns in source code, such as operator precedence and comma operator, that have been evaluated in prior research and the results show that they influence code understanding negatively. However, these patterns have not been evaluated in a real-world setting. Thus, it is not clear whether developers agree that the patterns studied by researchers can cause substantial misunderstandings in practice. To better understand these patterns, we applied a mixed research method approach, by performing repository mining and a survey with developers, to evaluate misunderstanding patterns in 50 C open-source projects, such as Apache, OpenSSL, and Python. We found more than 115K occurrences of the patterns in practice. Our study shows that only some patterns considered previously by researchers may cause misunderstandings according to developers. Moreover, a couple of patterns criticized by researchers are not used in practice. Our results guide developers to avoid misunderstanding patterns by showing that these patterns influence code understanding and maintenance negatively.

Main Results

Next we present the main results of this article. You can download the list of software repositories analyzed here.

RQ1. What is the frequency of occurrences of misunderstanding patterns in open-source projects?


To answer RQ1, we used our tool to count the number of occurrences of misunderstanding patterns in open-source projects. You can download our tool here. Before you can run it, please make sure you have SrcML installed on your system path. After that, please go the source and run class br.com.cpsoftware.Analyser.java to verify the code of open-source projects. Please put the code of the projects inside the code folder. Thus, you can reproduce the results of Table I. To test our tool, we performed an experiment with patterns identified manually and patterns identified by Dan Gopstein using his tool. Check it by downloading our tool and executing class br.com.cpsoftware.Analyser.java, but change variable FOLDER_TO_ANALYZE to point the experiment folder. In this folder, you can also check the variations of our patterns.

Table I: Subjects and Misunderstanding Patterns.

PROJECT    DOMAIN    PATTERNS    GUIDES
Apache Web Server 1524   Not available  
Cinder C++ Library 1544 X
Citus Database 17 X
Cleanflight Controller Firmware 5224 X
Cmake Build Tool 1352 X
Cmus Music Player 280   X  
Collectd Statistic Library 267 X
Contiki Operating System 1451 X
Ctags Tags Implementation 299   Not available  
Curl Command Line Tool 413 X
Dmd Compiler 562 X
Edk2 Firmware 5657 X
FFmpeg Video Tool 17321 X
FreeRDP Remote Desktop 1840   Not available  
Git Code Mirror 1968 Not available
Glfw Open GL Library 523 X
Grpc RPC Framework 20 X
Hiredis Database 47   Not available  
Irssi Chat Client 798 X
Jansson JSON Tool 24   Not available  
JohnTheRipper Password Cracker 4107   Not available  
Krb5 Security Library 3204 X
Libpng Image Library 483   Not available  
Librdkafka C++ Library 227 X
Libssh2 SSH Library 189   Not available  
Libuv I/O Library 195 X
Libwebsockets Websocket Library 389   Not available  
Lxc Linux Containers 552 X
Mongo Database 4023 X
Mpv Video Player 589 X
Openssl SSL Library 3252 X
Phpredis Database 248   Not available  
Poco C++ Library 1082 X
Premake-core Premake 985   Not available  
Python Compiler 2318   Not available  
Qmk_firmware Controller Firmware 364   Not available  
Radare2 Reverse Engineering 3746 X
Reactos Operating System 27939 X
Redis Database 1028 X
RetroArch Libretro API 3272 X
RIOT UI Library 773 X
S2n Security Library 67 X
Silver Searcher       Search Tool 15 X
Statsite Administration Tool 49   Not available  
Stb C++ Library 996 X
Swift Corelibs I18n Tool 730 X
Syslog-ng Log Daemon 231 X
Systemd System Manager 2608 X
Tvheadend Streaming Server 1878 X
Weechat Chat Client 2547 X

Next, we present an example to show how our tool works. In the code bellow, we can see an example of C code with an occurrence of pattern Conditional Operator.

File: example.c

void main(){
    int x = 10;
    int y = (x == 10) ? 20 : 30;
}


By running SrcML, it generates the following XML for file Example.c.

File: example.xml

<function>
<type><name>void</name></type>
<name>main</name><parameter_list>()</parameter_list>
<block>{
<decl_stmt>
<decl>
<type><name>int</name></type>
<name>x</name>
<init>= <expr><literal type="number">10</literal></expr></init>
</decl>;
</decl_stmt>
<decl_stmt>
<decl>
<type><name>int</name></type>
<name>y</name>
<init>=
<expr>
<ternary>
<condition>
<expr>
<operator>(</operator>
    <name>x</name>
<operator>==</operator> <literal type="number">10</literal>
<operator>)</operator>
</expr> ?
</condition>
<then>
<expr><literal type="number">20</literal></expr>
</then>
<else>:
<expr><literal type="number">30</literal></expr>
</else>
</ternary>
</expr>
</init>
</decl>;
</decl_stmt>
}</block>
</function>


Our tool works by reading the XML file and searching for patterns. For instance, to find conditional operators, the tool searches for declaration statements, which has initialization and ternary, as illustrated using bold font.

RQ2.Do developers of open-source projects agree that misunderstanding patterns influence code understanding negatively?


To answer RQ2, we performed a survey with 97 developers. You can download the results of the survey here.

RQ3.What are the guidelines that open-source projects provide regarding misunderstanding code patterns?

In Table I, you can find a link to the guidelines of each software project. You can also download the spreadsheet with the results here.

RQ4.Do developers of open-source projects accept pull requests to remove misunderstanding patterns?


In Table II, we present the pull requests submitted to the open-source projects. In Table III, we have the pull requests not answered by developers, Table IV presents the pull requests that we ignored, and Table V lists the pull requests that we submitted to to change contributors guidelines.

Table II: Pull Requests

   PROJECT       PATTERN       STATUS       REASON   
Irssi Dangling Else Rejected Developers decided to handle all style-related changes through the specification set in #784.
Systemd Dangling Else Accepted Accepted and merged in another pull request #7884.
JohnTheRipper Dangling Else Accepted
FreeRDP Initialization in Conditions Rejected Developers mentioned that it is not really necessary and more or less preference.
Mpv Operator Precedence Rejected The short-circuiting behavior of logical operators is already pretty common knowledge.
Radare Logic as Control Flow Rejected Developers mentioned that it is clear and valid for their coding style.
S2n Assignment as Value Accepted Developers accepted, but expected us to do additional changes that we could not do.
RetroArch Conditional Operator Accepted
Poco Operator Precedence Accepted
MapReady Operator Precedence Accepted
Dmd Conditional Operator Rejected Developers mentioned that chaining ternary expressions is a common pattern in some code bases.
Ossec Hids Conditional Operator Rejected Both versions seem easy enough to read to me.
OpenTX Assignment as Value Rejected The original code doesn't seem unreadable to me.
Machinekit Multiple Initialization at the same Line Rejected That is a very commonly used C assignment operation.
Libgit2 Operator Precedence Rejected I don't think this is much more readable. I definitely prefer the brevity.
Grpc Post Increment Accepted
MRuby Conditional Operator Accepted
MapReady Pre Increment Rejected I prefer the ++ style of incrementing myself, it's pretty standard C at this point.

Table III: Pull Requests not Answered by Developers

   PROJECT       PATTERN   
Linux Dangling Else
Libpng Dangling Else
MapReady Dangling Else
Riot OS Conditional Operator
Contiki Operator Precedence
Tengine Conditional Operator
Machinekit Assignment as Value

Table IV: Pull Requests Ignored

   PROJECT       PATTERN       REASON   
Curl Dangling Else Pattern in third party code
Gargoyle Dangling Else Pattern in deprecated code that is not used anymore
Krb5 Dangling Else Pattern in third party code
Libgit2 Dangling Else Pattern in third party code
Cinder Initialization in Conditions Wrong pull request, it introduced a bug

Table V: Pull Requests to Change Contributors Guidelines

   PROJECT       STATUS       REASON   
CMake Rejected Most of the proposed advice is already enforced by lint builds as part of our review and testing infrastructure.
Grpc Rejected GRPC C++ follows the Google style guide, and we have already documented this in the doc directory.
Libuv Rejected I don't think an exhaustive list of good coding practices is really appropriate.
Cleanflight Waiting Not accepted yet, but many developers agreed with the guidelines.
Cinder Not Answered -

Download the spreadsheet of an experiment that we performed to measure the percentage of pull requests accepted in open-source projects. In summary, we found that 29% of the pull request are accepted by developers.

Contact

If you have any question about our study, please contact the main researcher:

Flávio Medeiros
Email: flavio.medeiros at ifal.edu.br.
CPSoftware Research Group