Open Source Community Health: Analytical Metrics and Their Corresponding Narratives

—Open source projects are most often evaluated by potential contributors and consumers using metrics that describe a level of activity within the project because those measurements are available. The principle question in the minds of most evaluators, however, is ”How healthy and sustainable is this project in the context of its competitors or dependent projects”? Limitations of current analysis methods focused on trace data alone are discussed, and reviewed in depth. Next, our methods for conducting engaged ﬁeld research, developing metrics standards as part of a corporate communal partnership, and molding tools that evolve through a reﬂexive discourse with practitioners using standard metrics is framed as an approach to consider for examining open source software health and sustainability. Researchers, in particular, need tools for increasing the feasibility of comprehensive, multi-project health and sustainability studies, and connecting trace data with human experience. From a practice perspective, these same conditions are increasing the difﬁculty organizations and individuals engaged in open source face when trying to understand the status, condition, and health of a particular project, the project’s ecosystem or ecosystems emerging around their speciﬁc project context. This study examines the work of a Linux Foundation working group, CHAOSS (Community Health Analytics Open Source Software) during the ﬁrst four years of the formation. The paper concludes with examples of CHAOSS metrics operationalized in partnership with corporate collaborators in a manner that emphasizes comparison, transparency, trajectory and visualization as components for discursive, evolutionary understanding of open source software health.


I. INTRODUCTION
Open source software is vital to most public computing infrastructure, and an increasingly common component of software development jobs. Following the Heartbleed OpenSSL security bug in 2014, the US Congress sought the aid of the Linux Foundation to help better understand what steps could be taken to make open source project health and risk more visible [1]. Indirectly, open source software's footprint on society is growing in a number of dimensions. As a point of reference, over five billion dollars 1  The demand for coherent, consistent and actionable metrics is growing as a result of corporate engagement with open source [2], [3]. This demand is felt most by open source community managers, who play a role in the formal and informal nurturing of developer communities around open source projects, and internal open source program offices who manage portfolios of open source projects contributed to by a firm [4], [5]. The rapid growth of open source is intersecting with a large number of ongoing metrics development research studies that target individual areas in ecosystems of open source software, but do not yet amount to clearly identifiable, specific metrics, or aid in locating a project's position within particular ecosystems. Everyone involved in open source software wants a clearer understanding of individual projects, and boundaries for ecosystems to advance insight into their joint health and sustainability.
Prior work defines open source project health in a myriad of indicators and tools. Each approach aims to make representations of open source project health efficient through the collection, aggregation and analysis of trace data from repositories, issue trackers, and other traces of work and communication [6]. This study adopts the general view that open source project health is a project's ability to continue to produce quality software [7], [8]. Trace data is a building blocks for nearly all measures of open source project health [9], [10], [11], making the collection and analysis of data related to the construction of open source at once essential for representing open source project health, and also at present, incommensurate with the challenge [12], [13]. Specifically, most research to date has three limitations: 1) the research is spread across several disciplines and is therefore not accumulating shared understanding (and arguably is diffusing disconnected insights), 2) most research to date focuses on some specific, narrow segment of the expansive world of open source software, thus forestalling more substantial insights, and 3) analysis of trace artifacts is rarely combined with systematic, ongoing engagement with open source projects as they evolved, limiting the durability of indicators derived from trace as indicators of health and sustainability.
Though many of the narrowly focused metrics applied to assess open source projects have utility, they do not get at the question of open source project health and sustainability. Metaphorically they could be thought of as using the results of an individual's annual physical to try and assess public health questions for the region they live in. This public health metaphor maps to the use of discrete, open source metrics to what public health researchers refer to as a type III error; we have the right answers to the wrong questions [14]. To avoid type III errors and answer the right questions; open source health and sustainability variation across projects and time, and the presence of specific risk factors must integrate and be weighted against each other based on project context that includes ongoing feedback loops with metric consumers within open source projects. For example, a decline in activity could mean either an emergent risk of the community losing contributor engagement, or the stabilization of a fast-growing new technology. Most cases are not so clear cut as those two.
In this paper, we frame a four-year engaged field research study within the Linux Foundation's Community Health Analytics in Open Source Software (CHAOSS) project. Our findings from this study illustrate a pathway making open source health and sustainability visible in context by emphasizing comparison, transparency, trajectory and visualization through storytelling. Our contribution is to frame an approach for understanding the overall health and sustainability of open source software projects that puts each project in a context salient for its position in its life-cycle, and within groups of projects with similarities identified through a combination of computational similarity measures and heuristic's defined by individuals and organizations engaged in open source software. We also present CHAOSS Augur software as tool to understand community health through storytelling.
There is a considerable amount of research constructing and presenting indicators of open source project activity, but a lack of consensus about how indicators derived from trace data might be used to represent a coherent view of open source project health and sustainability. Researchers define project health through the collection of success measures [49]; including metrics related to project output, process, and the outcomes for project members [11]. In this context, success is the result of project activity and the release of code, otherwise the project is abandoned [50]. Further measures of success explore the growth and diversity of projects [34]. Activity measures tell you when a project is finished; health and sustainability measures would identify that trajectory in advance.
Project health can be framed through measures representing separate but related indications of sustainability and survivability [51]. Sustainability is signaled by three factorscommunity growth, financial resources, and software management [52]. While sustainability evokes shared commitment, survivability aims to surface indications of a project's vigor, resilience, and organization in the face of risk [53]: when bad things happen to good projects, which ones are most likely to overcome? Sustainability and survivability, together, aspire to provide clear signals of a project's likelihood to continue producing quality software [8].
Likelihood of continued software production is a broad way to frame the goal of assessing open source project health. Measurement of health signals is taken up across a wide range of research domains. Aman et al. [54] applied the Pareto principle to open source projects to define a healthy project as a project where the proportion of core developers to non-core ones equates to roughly 80% of the code contributions being produced by 20% of the active developers. However, only a small slice of open source projects fit with this model [55]. Open source project health has also been assessed by examining the bus factor of projects (e.g., the risk associated with losing key project contributors and the knowledge that they possess) [56], [57], and research has shown that many open source projects have low bus factors, implying that they are not likely to survive disruption. In the next section we review a wide range of prior open source software research adjacent to questions of health, drawing out our argument that the present state of research is incommensurate with the problem space.

B. Open Source Health Research Solves a Different Problem
Researchers explore open source project health in different contexts and employ many different measures. Additionally, researchers often apply measures from other domains that don't take into account the unique nature of open source projects. What is clear is that activity is a common proxy for understanding project health [8], however, research focused on discrete activities addresses only the presence of activity, offering little insight about the complexities of future sustainability or survivability. Activity metrics illustrate the likelihood that a project will ever get traction, but tells us little about its arc following launch.
Activity is insufficient as a proxy for open source health particularly because many open source health studies are conducted at the smallest unit of analysis, building metrics up from one type of repository activity, such as a software commit [58]. Alternately, large scale summaries of open source health often squash entire repository histories and draw inferences about projects and ecosystems [17] without consideration of the evolution of projects over time. Table I illustrates the wide variety of measures used to operationalize open source project health using activity metrics or large scale summaries. Much of the early literature on open source project health focused on success measures [59], [60], [11], [61], [62], [63], however as open source health research has matured, the focus has moved to measures of sustainability [64], [51], [65], [52], [66], [67], [68], [58] and further, to an understanding that open source health must include considerations for social interactions and project diversity when estimating survivability [69], [7], [70], [34], [9], [71], [72]. Sustainability and survivability are difficult to estimate based on the current or historical level of activity alone, especially in an era where corporate engagement is a rapidly increasing aspect of open source ecosystems across the globe [2].
Our analysis of the literature surfaces a number of discrete metrics aimed at different aspects of open source project health, including a) sufficient scale, b) project culture, c) process quality, d) product quality, e) contextualized risk, f) license risk, and g) corporatization and access to resources. Each set of discrete metrics is useful, and reflects a particular aspect of an open source project that can contribute to its likelihood of continuing to produce quality software. This contemporary focus on the development of discrete metrics from open source project activity is necessary for ascertaining project health but does not make project health visible. While open source practitioners yearn for insight, they are often confused by an overwhelming array of dashboards that force together some set of "frankenmetrics" derived from the categories we identify. Open source metrics research is discrete, conducted at a point in time and not focused on the specific needs of metrics consumer communities like open source community managers and open source program offices. The "measurement of open source project health is further confounded because health metrics may have different meanings for different projects and there is little consensus about the correct way to calculate metrics across projects" [2, p. 14].

III. METHODS
Our work is built upon an ongoing ten-year research study exploring organizational engagement with open source projects. Our efforts are localized in open source projects that include heavy organizational engagement, mainly projects brokered by the Linux Foundation, 3 a 501(c)(6) trade association and its member companies, which constitute a substantial majority of the technology sector of the economy. The Linux Foundation has helped "establish, build, and sustain some of the most critical open source technologies fostering innovation in every layer of the software stack. The Linux Foundation hosts projects spanning enterprise IT, embedded systems, consumer electronics, cloud, and networking".
The methods we use systematically integrate field engagement with analysis of trace data from open source project activity to build greater understanding of the relationship between how projects and ecosystems experience health, and what analysis of open source software trace data can reveal. Within this context and over the prior ten years, we employed a variety of approaches including participant observation [77], group informatics [6] (a systematic methodological approach for reflexive analysis of field data and trace data), direct engagement [78] , and critical reflection [79]. Participant observation is used as a field-based approach when members of our research team were directly engaged in the practices we sought to understand [77], [80], [78]. Group informatics was used to reflexively make meaning from the intersection of our fieldwork and the significant amounts of digital trace data that emerges within open source projects; thus ensuring coherence of research constructs in the outcomes produced in our participant observation [23], [6]. Finally, to ground our findings from participant observation [77] and group informatics [6], we used direct engagement and critical reflection as our "process of learning from experience" [79]. Through this field study, data was generated from approximately 200 interviews, 200 survey responses, ten focus groups, 1,000 pages of field notes, constant comparison, content analysis [81], trace ethnography [13], social network analysis [61], [6], and computational linguistic analysis [82].

A. Engaged Field Research
This paper focuses on a four-year engagement with the CHAOSS project 4

IV. ORDER FROM CHAOSS
The CHAOSS communities efforts to develop a set of metrics and the tooling around metrics is complicated by competing perspectives. The standards the community can develop are limited to the discrete metrics that can be discussed and agreed to, which is not of course the end of the line for defining health and sustainability in open source.

1) Project Stakeholders:
Work within the CHAOSS project identified two key stakeholder types -community managers and program managers -interested in community health metrics. Open source community managers oversee communities in the communal side of the corporate communal relationship. The job of the open source community manager is to enable open source project members to achieve personal and project goals [83]. The role does not always have an explicit title nor is the job mutually exclusive. These individuals may take on many roles in a project, including developers, document writers, facilitators and maintainers. The focus for these individuals is building and maintain healthy communities of contributors -towards the successful completion of individual project goals.
Open source program managers oversee corporate interests from corporate side of corporate communal relationships.  [84].
Community and program managers take a variety of perspectives, depending on where their communities are in the life-cycle of growth, maturity, and decline. This paper is an evolving report of what we are learning from community and program office managers, some of whom we are working with on live experiments with a CHAOSS project prototyping software tool called Augur 5 . At this point, we are paying particular attention to how community managers consume metrics and how the presentation of open source project health and sustainability metrics could make those metrics more and in some cases less useful for evaluating open source project health.
2) New Goals and Concerns: "Who is going to use these metrics and for what purpose?" is a question that matters a great deal to the people engaged in the CHAOSS project. The development of health and sustainability metrics for open source software surfaced specific concerns among our participants. Making statistics available to people unfamiliar with the state, functioning and current goals of an open source ecosystem creates risk that the numbers will lead to a reflexive, uncontextualized side narrative that could harm the efforts of open source community and project office managers. The risk is that numbers without context will aid in the development of stories that are not focused on increasing health and sustainability, but instead focused on telling stories that emphasize activity over health. Through discussions with community managers and program managers we have identified several user stories related to project health: • "As a community manager I want to be able to compare the open source projects that I manage" • "As a community manager I want to identify trends in the data collected about my repositories" • "As a community manager I need to know which contributors are making the largest impact across projects that I manage" • "As a community manager I need to know how contributor behavior is changing" • "As a program manager adopting open source, I need to know what risks my company will face if they use software" • "As a program manager adopting open source, I need to be able to compare projects to determine which solution will be the least risky to our company" 3) From Metrics to Health: Frustrations with the elusiveness of coherent, durable indicators of open source project health pervade the experiences we observed. In practice, there is evidence that community managers and open source program offices occasionally stand up discrete metrics tools, but ultimately fail to find them useful in their search to fully understand open source project health over time. One of the central goals that emerged from our participatory design process was to make it easier for open source stakeholders to "get their bearings" on a project and understand "how things are going". Interactions with stakeholders created insight that this was most easily accomplished when comparisons between internal and external projects over time, are readily available.
Through our field study, four core principles for making sense of data emerged: comparison, transparency, trajectory and visualization. These principles can be operationalized through four human-centered data science strategies:

V. USING AUGUR TO TELL STORIES
Our engaged field work led to several dozen implementations of Augur in the service of large corporations in the technology sector over the past three years. Through each engagement, the data collected, CHAOSS metrics generated, and integration of those metrics into visualizations evolved. Organizations like Twitter simply leveraged Augur's verifiable data completeness to build their own front-ends. We view the tool, and its evolution, as both a means to an ends, and a reflection of how the CHAOSS community's understanding of the potential of standard metrics is evolving.
Augur empowers open source community managers to tell data-driven stories. Augur is designed to enable the rapid, reflexive exploration project health using CHAOSS metrics to answer questions as they emerge from experience, and used to understand specific questions about health and sustainability within an organizational context, including open source project ecosystems and corporations. This is possible due to its well documented, relational data model, and software that verifies data collection accuracy 6 .
Telling a health and sustainability story for an open source project requires more than a collection of the metrics available and defined for a project. Because having tools that are useful for telling stories is of such importance to open source program and community managers the visual design of data in Augur is aimed at providing a synthesis of several related metrics in visualizations that solve specific problems or answer specific questions that emerged during design. Two examples of Augur's use of metrics to synthesize narrative understanding of open source projects through analysis of trace data make the approach more clear by synthesizing collections of CHAOSS metrics into captioned visualizations for a half dozen different open source projects: 1) Understanding new contributors and repeat contributors over time, and 2) Understanding pull request responsiveness within a project, contrasted with responsiveness from other projects.

A. Example: Pull Request Responsiveness Comparisons
Responsiveness to outside contributions, reflected as pull requests on GitHub, are one way the both community managers and open source program officers draw comparisons between different projects they are contributing to or competing against. The use of visual contrast in this figure helps illustrate differences between one project, Zephyr, and an anonymized field of competitors. Augur's anonymization capability in comparisons enables contextualized understanding without directly commenting on other projects.

B. Example: Understanding New and Repeat Contributors
Another significant, and complex indicator of project health included a number of CHAOSS metrics related to new contributors, including the types of contributions, and understanding of when or if second contributions were made. Augur provides individual, or comparative analysis of these new and repeat contributor statistics, using parameterizable, visualization API calls. In figure 2, we see a time series graph of the types of second contributions made on a project.
Augur's design emerged from a stark recognition that a good deal of open source metrics research to date has not arrived at the central importance of combining trace analysis with storytelling, and systematic adaptation of metric combinations as an evolving enterprise.

VI. DISCUSSION
As researchers, we build tools that help us gather data, and report findings in academic venues. In our role as engaged field researchers, we made a decision to ensure that our collection of trace data, and its integration with field work, is not the end goal. Instead, in this case, collaborators within the CHAOSS project are guiding the evolution of the tool. First and foremost, its data gathering, and relational organization, enable new questions to be answered with relative ease. Augur data can be accessed and presented through Jupyter notebooks, two different web front-ends of, and customized front-ends that access Augur's RESTFUL API. The API endpoints follow a logic that is twofold. First, each CHAOSS metric is an individual endpoint. Second, creating new endpoints that generate data, or static visualization files makes it possible to ask nearly any question you can think of, across thousands of open source software repositories, in hours to days.
Every computing centered open source software scholar is capable of accomplishing similar working systems. However, there are few to date, and no others whose origins coexist with those of a coordinated effort to develop reusable open source software metrics around a range of specific concerns. And, as scholars, we too often leave the tools we build by the wayside as funding shifts, and our aims move on. Our position, and our hope, is that by sharing our approach, aims, and accomplishments to date, this work will inspire a discussion about how to make research software aimed at providing a coherent view of open source software health highly adaptable to changes, and sustainable.

A. Limitations
This paper shares a sliver of the technical depth and breadth of an open source software project aimed at making CHAOSS metrics actionable. The argument we make, is that an approach focused on engaged field research, and sustained discussions with practitioners, requires sustained commitment to the development of tools like Augur. Centering the aims of Augur on discourse about open source project health, and narrative construction for CHAOSS organizations, limits discrete empirical findings in this paper. We aim to report those findings, in partnership with our CHAOSS collaborators, in future work.