User Tools

Site Tools


oss-health-metrics:metrics:contributors

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
oss-health-metrics:metrics:contributors [2017/04/10 16:59]
abuhman [7. Pseudo Code/Query]
oss-health-metrics:metrics:contributors [2017/10/06 20:54] (current)
GeorgLink fixed link
Line 1: Line 1:
 ====== Contributors ====== ====== Contributors ======
-===== 1. Acronym ===== 
  
-===== 2. Alternate Names ===== +Page permanently moved to 
- +https://github.com/​chaoss/​metrics/​blob/​master/​activity-metrics/​contributors.md
-===== 3. Classification ===== +
-Community Health +
- +
-===== 4. Description ===== +
-Number of contributors +
- +
-==== Visualization ==== +
- +
-==== Interpretation ==== +
- +
-===== 5. Known Implementations ===== +
- +
-===== 6. Data Source ===== +
-GHTorrent +
- +
-===== 7. Pseudo Code/Query ===== +
-=== Contributions by user === +
- +
-    SQL +
-    # Gets the count of types of contributions by user, only returning users who have interacted with a project in some way +
-    SET @owner = "​rails";​ +
-    SET @repo  = "​rails";​ +
-    USE msr14; # Test database +
-    # USE ghtorrent; # Production database +
-    SET @proj = (SELECT projects.id FROM projects INNER JOIN users ON projects.owner_id = users.id WHERE projects.name = @repo AND users.login = @owner); +
-    SELECT * FROM +
-       ( +
-       ​SELECT ​      ​users.id ​       as "​user_id",​  +
-             users.login ​    as "​login",​ +
-             ​com.count ​      as "​commits",​ +
-             pulls.count ​    as "​pull_requests",​ +
-             iss.count ​      as "​issues",​ +
-             comcoms.count ​  as "​commit_comments",​ +
-             pullscoms.count as "​pull_request_comments",​ +
-             isscoms.count ​  as "​issue_comments" ​        +
-       FROM users +
-       LEFT JOIN (SELECT committer_id AS id, COUNT(*) AS count FROM commits INNER JOIN project_commits ON project_commits.commit_id = commits.id WHERE project_commits.project_id = @proj GROUP BY commits.committer_id) AS com +
-       ON com.id = users.id +
-       LEFT JOIN (SELECT pull_request_history.actor_id AS id, COUNT(*) AS count FROM pull_request_history JOIN pull_requests ON pull_requests.id = pull_request_history.pull_request_id WHERE pull_requests.base_repo_id = @proj AND pull_request_history.action = '​merged'​ GROUP BY pull_request_history.actor_id) AS pulls +
-   ON pulls.id = users.id +
-       LEFT JOIN (SELECT reporter_id AS id, COUNT(*) AS count FROM issues WHERE issues.repo_id = @proj GROUP BY issues.reporter_id) AS iss +
-   ON iss.id = users.id +
-       LEFT JOIN (SELECT commit_comments.user_id AS id, COUNT(*) AS count FROM commit_comments JOIN project_commits ON project_commits.commit_id = commit_comments.commit_id WHERE project_commits.project_id = @proj GROUP BY commit_comments.user_id) AS comcoms +
-   ON comcoms.id = users.id +
-       LEFT JOIN (SELECT pull_request_comments.user_id AS id, COUNT(*) AS count FROM pull_request_comments JOIN pull_requests ON pull_request_comments.pull_request_id = pull_requests.id WHERE pull_requests.base_repo_id = @proj GROUP BY pull_request_comments.user_id) AS pullscoms +
-   ON pullscoms.id = users.id +
-       LEFT JOIN (SELECT issue_comments.user_id AS id, COUNT(*) AS count FROM issue_comments JOIN issues ON issue_comments.issue_id = issues.id WHERE issues.repo_id = @proj GROUP BY issue_comments.user_id) AS isscoms +
-   ON isscoms.id = users.id +
-       GROUP BY users.id +
-       ORDER BY com.count DESC +
-       ) user_activity +
-    WHERE commits IS NOT NULL +
-    OR    pull_requests IS NOT NULL +
-    OR    issues IS NOT NULL +
-    OR    commit_comments IS NOT NULL +
-    OR    pull_request_comments IS NOT NULL +
-    OR    issue_comments IS NOT NULL; +
- +
-### Number of Contributors (committers) per Project: +
- +
-GitHub defines contributors as those who have made "​Contributions to master, excluding merge commits"​ +
- +
-I do not see in the GHTorrent database schema a way to determine commits to master vs other branches, +
-Nor a way to differentiate merge commits from other commits. +
- +
-Because of this, for the following SQL query I will define ​contributors ​as "users who have made a commit"​ +
- +
-When viewing the table, one may notice that there is a separate author_id and committer_id for each commit +
-the commit author has written the code of the commit. +
-the commit committer has made the commit +
-example: author writes some code and does a pull request. ​ committer approves/​merges the pull request +
- +
-For the following SQL, I am considering the author to be the contributor. +
- +
- select projects.name as project_name,​ projects.url as url, count(distinct commits.author_id) as num_contributers +
- from commits  +
- join project_commits on commits.id = project_commits.commit_id +
- join projects on projects.id = project_commits.project_id +
- group by project_commits.project_id +
- +
- +
-===== 8. Data Exchange Format ===== +
-Describe how the metric is encoded for sharing and communicating between programs. +
- +
-===== 9. References to Academic Literature ===== +
-Papers released in the academic literature that uses the metric. +
- +
-===== 10. Internet References ===== +
-Links to websites that add value to the understanding of the metric. +
- +
-===== 11. Contributors ===== +
-Authors of the metric page and authors who made significant changes.+
oss-health-metrics/metrics/contributors.1491843556.txt.gz · Last modified: 2017/04/10 16:59 by abuhman