User Tools

Site Tools


oss-health-metrics:metrics

This is an old revision of the document!


<< Back to OSS Health Metrics - Working Group

Note: We use the term “indicators” on this page synonymously with “metric”. Future discussions will show which term we will continue to use.

Health Indicators

We roughly categorize health indicators in three categories: code health, community health, and compliance health.

Comments

Disclaimer: We list and describe health indicators. By no means are we evaluating them for suitability. Open source communities have a flurry of different stakeholders and projects with each having s different interpretation of the indicators. For different situations, the indicators will carry different meanings.

Many of the indicators are also informative when tracked over time.

We should agree on a template for the metrics.

Metrics List

Community Health
Community health contains indicators descriptive of community interactions and behavior.
NameDescription
Contributor Diversity Ratio of contributors from a single company over all contributors
Also described as: Maintainers from different companies. Diversity of contributor affiliation. This is mentioned frequently
Issue Response Rate Time between a new issue is opened and a maintainer responds
Also called: bug response rate. The maintainer is believed to not “pile on” but try to solve an issue. This is mentioned frequently
Community ActivityContribution Frequency
(Contribution = commit, issue, comment, …)
Contributor BreadthRatio of non-core committers (drive-by committers)
Can indicate openess to outsiders
Contribution DiversityRatio of code committed by contributors other than original project initiator
Contributions are going up beyond the core team
Contribution AcceptanceRatio of contributions accepted vs. closed without acceptance
Bus FactorThe number of developers it would need to lose to destroy its progress. Alternatively: Number of companies that would have to stop support.
ContributorsNumber of contributors
Contributor ActivityActivity level of individual contributors
Relative Activity I sum up the activities (GH issues+comments, GH pull requests+comments and GH commits) for the project members and for the non-project members, then I create a ratio of the two.
Compare the activity between committers-as-a-group and contributors-as-a-group. It easily shows when a project is not yet popular, or when a project is not paying attention to its users. I also feel that a balance between the two groups is essential; ie) a project with a lot more contributor than committer activity is one that is failing to 'recruit' committers quickly enough.
Distribution of WorkHow much recent activity is distributed?
Contribution AgeTime since last contribution
Gives a sense of how active the community is. (Contribution = commit, issue, comment, …)
ForksNumber of forks
StarsNumber of stars (GitHub)
WatchersNumber of watchers (GitHub)
Issues OpenNumber of open issues
Issues submitted/closedIssues submitted vs. issues closed
Example
Issue CommentsNumber of Comments per Issue
Time to ContributorTime to becoming a contributor
Path to LeadershipA communicated path from lurker to contributor to maintainer. (or. track members: time from user to maintainer/leader)
Rational: If active contributors are not included in leadership decisions they might lose interest and leave. (Focus on least likely contributor)
BlogpostsNumber of blogposts that mention the project
YouTube VideosNumber of Youtube videos that mention or specifically deal with the project (e.g. tutorials)
Job PostingsNumber of job postings that mention the project as a preferred or required skill
DownloadsNumber of downloads
! beware: downloads might be skewed by builders
Used as measure for 'success' (Grewal, Lilien, & Mallapragada, 2006)
Reopened issuesRate of issues closed but discussion continues or issues that were closed and re-opened
Release VelocityTime between releases
Regular releases are a reliability metric
Release MaturityRatio of major and minor releases
Decision DistributionCentral vs. distributed decision making
Governance model, scalability of community
TransparencyNumber of comments per issue
Discussion is occuring openly - could also indicate level of agreement
RoadmapExistence and quality of roadmap
Best Practice: community engagement and scalability (might not be automatically computable)
GatheringsNumber of face-to-face/in-person meetings per year
Resets contentious issues; Resolve tensions; Avoid longstanding grudges
Role DefinitionsExistence and quality of role definitions
Governance related. Relates to “Path do Leadership”
RewardsRewards, shout-outs, recognition, and mentions in pull-requests or change logs - might improve contribution levels
RetrospectivesExistence of after release meetings
Collect lessons learned, improve processes, recognize contributors
Onion LayersDistance between onion model layers (users, contributors, committers, and steering committee)
Rule of thumb: factor of 10x between layers. (OSLS'17 Node.js keynote)
Release Note CompletenessNumber of functionality changes and bug fixes represented in release notes vs. release.
Good for users, also shows diligence of community
UnityRivalry or unity of community (sentiment analysis?)
Use of AcronymFrequency of acronyms used
Specialized language can be a barrier for new contributors.
Language BiasDiversity metric: Bias against gender, ethnicity, … in use of language (maybe use sentiment analysis)
Commit BiasDiversity metric: acceptance rate (and time to acceptance) differences per gender, ethnicity, etc…
Stack OverflowSeveral metrics: # of questions asked, response rate, number of responding people that have verified solutions
Non-Source ContributionsTrack contributions like running tests in test environment, writing blog posts, producing videos, giving talks, etc…
Maturity LabelCommunity assigned label
Some communities label projects as incubator, mature, (or something)
User Groupsuser groups perform a variety of crucial marketing, service support, and business-development functions at the grassroots level
(Bagozzi & Dholakia, 2006)
Age of CommunityTime since repository/organization was registered; or Time since first release.
“Results showed that the age of the project played a marginally significant role in attracting active users, but not developers. We attribute this differential effect of age on users and developers to the fact that age may be seen as an indicator of application maturity by users, and hence taken as a positive signal, whereas it may convey more ambiguous signals to developers.” (Chengalur-Smith et al., 2010, p.674) (Grewal, Lilien, & Mallapragada, 2006)
Code Health
Code health contains indicators descriptive of a code base and its quality.
NameDescription
Pull Request made/closedPull requests made vs. pull requests closed
Example
Encompasses number of pull requests rejectet
Pull Requests OpenNumber of open pull requests
Might be more telling than total pull requests
Pull Request CommentsNumber of comments per pull request
Pull Request Discussion DiversityNumber of different people discussing each pull request
Update RateNumber of updates over period x
Update RegularityHow consistently and frequently are updates provided.
Update AgeTime since last update
Repository SizeOverall size of the repository or number of commits
Size of Code BaseLines of code
Bugs after ReleaseNumber of bugs reported after a release
Code ModularityModular code allows parallel development, which Linus Torvalds drove for Linux (OSLS Torvalds)
(Baldwin & Clark, 2006)
Compliance (Risk) Health
Compliance health contains indicators informative of vulnerabilities and license obligations.
NameDescription
Test Coverage
Bug AgeAge of known bugs in issue tracker
Use label for determining bugs?
Known VulnerabilitiesNumber of reported vulnerabilities
Could be limited to issue-tracker or extended vulnerability databases (e.g. CVE)
Dependency DepthNumber of projects included in code base + number of projects relying on focal project (recursive)
Indicator about centrality in open source Dependency network
License DeclaredWhat license does the project declare
License ConflictDoes the project contain incompatible licenses
All LicensesList of licenses
License CountNumber of licenses
License CoverageNumber of files with a file notice (copyright notice + license notice)

Reasons why community health is assessed

This includes reasons why metrics are considered for other reasons This section collects notes on what possible goals might be.

  • Track Corporate Engagement (is an organization creating value, are organizational goals met, employee contributions)
  • Risk mitigation
  • Identify open source projects that need support.
  • Identify single points of failure (and hopefully prevent them)
  • Assess value generated through community and engagement
  • Show that active community management bears desired results. (Measurable outcomes)
  • Avoid in-take of an inactive project, because it makes it difficult to maintain and might carry unknown bugs and security issues.
  • Sustainability: “we define a sustainable project as one that exhibits software development and maintenance activity over the long run.” (Chengalur-Smith, Sidorova, & Daniel, 2010, p.660)

Broad categories of indicators that we hear often

  • Growth of community
  • Momentum of community
  • Timeliness of maintainers
  • Diversity of community, contributions, and in code base
  • Distribution of code contributions (beyond project creator)
  • Activity level - Responsiveness
  • Viability (Bus Factor - individual contributors and clustered by employer)
  • Maturity
  • Ecosystem health (upstream, downstream, and related projects)
  • Vanity metrics (might have use in other cases, e.g. stars)
  • Aggregate project-tree health (combined health metrics of all linked dependencies)
  • Attentiveness of maintainers to users. See Mailing list

Context: Considerations when evaluating health

  • Style of project
  • Programming language
  • Maturity of project (Projects might seem inactive but rather have fulfilled their goal and community remains responsive to bug reports and security issues, just no new features)
  • Quality of Ecosystem (metrics of related projects)
  • Value driven metrics (not just activity)
  • Development of metrics over time
  • External users might not be a homogenous group - consider different metrics
  • Compare similar projects (manually determine which projects to compare)
  • Classifications (based on a set of metrics, which projects 'behave' similar)
  • Interrelationships between categories of indicators (maturity might be high while activity low and response rate is up)
  • Aggregate from repository, to project, to community, (to company)

Other classifications for indicators

We have heard other classifications that we simply list here.

Ideas for these classifications is to 1. generate a uniform classification and through conversations merge the different classifications. 2. create mappings of the indicators to the different classifications

  • Community/Code/Risk
  • Activity/Viability/Risk

References

  • Bagozzi, R. P., & Dholakia, U. M. (2006). Open Source Software User Communities: A Study of Participation in Linux User Groups. Management Science, 52(7), 1099–1115. Retrieved from http://www.jstor.org/stable/20110583
  • Baldwin, C. Y., & Clark, K. B. (2006). The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model? Management Science, 52(7), 1116–1127. Retrieved from http://www.jstor.org/stable/20110584
  • Chengalur-Smith, I., Sidorova, A., & Daniel, S. (2010). Sustainability of Free/Libre Open Source Projects: A Longitudinal Study. Journal of the Association for Information Systems, 11(11). Retrieved from http://aisel.aisnet.org/jais/vol11/iss11/5
  • Grewal, R., Lilien, G. L., & Mallapragada, G. (2006). Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems. Management Science, 52(7), 1043–1056. Retrieved from http://www.jstor.org/stable/20110579
oss-health-metrics/metrics.1493241531.txt.gz · Last modified: 2017/04/26 21:18 by GeorgLink