User Tools

Site Tools


oss-health-metrics:metrics:contributors

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
oss-health-metrics:metrics:contributors [2017/05/16 16:52]
klumbard
oss-health-metrics:metrics:contributors [2017/09/09 23:45]
GeorgLink Page permanently moved
Line 1: Line 1:
 ====== Contributors ====== ====== Contributors ======
  
-===== 1. Description ===== +Page permanently moved to 
-Number of contributors +https://​github.com/​chaoss/metrics/​blob/​master/​metrics/​contributors.md
- +
-===== 2. Use Cases ===== +
- +
-===== 3. Sample Visualization ===== +
- +
-===== 4. Sample Implementation ===== +
-===GHTorrent:​ Contributions by user === +
- +
-    SQL +
-    # Gets the count of types of contributions by user, only returning users who have interacted with a project in some way +
-    SET @owner = "​rails";​ +
-    SET @repo  = "​rails";​ +
-    USE msr14; # Test database +
-    # USE ghtorrent; # Production database +
-    SET @proj = (SELECT projects.id FROM projects INNER JOIN users ON projects.owner_id = users.id WHERE projects.name = @repo AND users.login = @owner); +
-    SELECT * FROM +
-       ( +
-       ​SELECT ​      ​users.id ​       as "​user_id",​  +
-             users.login ​    as "​login",​ +
-             com.count ​      as "​commits",​ +
-             pulls.count ​    as "​pull_requests",​ +
-             iss.count ​      as "​issues",​ +
-             comcoms.count ​  as "​commit_comments",​ +
-             pullscoms.count as "​pull_request_comments",​ +
-             isscoms.count ​  as "​issue_comments" ​        +
-       FROM users +
-       LEFT JOIN (SELECT committer_id AS id, COUNT(*) AS count FROM commits INNER JOIN project_commits ON project_commits.commit_id = commits.id WHERE project_commits.project_id = @proj GROUP BY commits.committer_id) AS com +
-       ON com.id = users.id +
-       LEFT JOIN (SELECT pull_request_history.actor_id AS id, COUNT(*) AS count FROM pull_request_history JOIN pull_requests ON pull_requests.id = pull_request_history.pull_request_id WHERE pull_requests.base_repo_id = @proj AND pull_request_history.action = '​merged'​ GROUP BY pull_request_history.actor_id) AS pulls +
-   ON pulls.id = users.id +
-       LEFT JOIN (SELECT reporter_id AS id, COUNT(*) AS count FROM issues WHERE issues.repo_id = @proj GROUP BY issues.reporter_id) AS iss +
-   ON iss.id = users.id +
-       LEFT JOIN (SELECT commit_comments.user_id AS id, COUNT(*) AS count FROM commit_comments JOIN project_commits ON project_commits.commit_id = commit_comments.commit_id WHERE project_commits.project_id = @proj GROUP BY commit_comments.user_id) AS comcoms +
-   ON comcoms.id = users.id +
-       LEFT JOIN (SELECT pull_request_comments.user_id AS id, COUNT(*) AS count FROM pull_request_comments JOIN pull_requests ON pull_request_comments.pull_request_id = pull_requests.id WHERE pull_requests.base_repo_id = @proj GROUP BY pull_request_comments.user_id) AS pullscoms +
-   ON pullscoms.id = users.id +
-       LEFT JOIN (SELECT issue_comments.user_id AS id, COUNT(*) AS count FROM issue_comments JOIN issues ON issue_comments.issue_id = issues.id WHERE issues.repo_id = @proj GROUP BY issue_comments.user_id) AS isscoms +
-   ON isscoms.id = users.id +
-       GROUP BY users.id +
-       ORDER BY com.count DESC +
-       ) user_activity +
-    WHERE commits IS NOT NULL +
-    OR    pull_requests IS NOT NULL +
-    OR    issues IS NOT NULL +
-    OR    commit_comments IS NOT NULL +
-    OR    pull_request_comments IS NOT NULL +
-    OR    issue_comments IS NOT NULL; +
- +
-===GHTorrent:​ Number of Contributors (committers) per Project: === +
- +
-GitHub defines contributors as those who have made "​Contributions ​to master, excluding merge commits"​ +
- +
-I do not see in the GHTorrent database schema a way to determine commits to master vs other branches, +
-Nor a way to differentiate merge commits from other commits. +
- +
-Because of this, for the following SQL query I will define contributors as "users who have made a commit"​ +
- +
-When viewing the table, one may notice that there is a separate author_id and committer_id for each commit +
-the commit author has written the code of the commit. +
-the commit committer has made the commit +
-example: author writes some code and does a pull request. ​ committer approves/​merges the pull request +
- +
-For the following SQL, I am considering the author to be the contributor. +
- +
- select projects.name as project_name,​ projects.url as url, count(distinct commits.author_id) as num_contributers +
- from commits  +
- join project_commits on commits.id = project_commits.commit_id +
- join projects on projects.id = project_commits.project_id +
- group by project_commits.project_id +
- +
-===== 5. Known Implementations ===== +
-[[https://​github.com/​OSSHealth/ghdata|GHData]] +
- +
-===== 6External References (Literature) ===== +
oss-health-metrics/metrics/contributors.txt · Last modified: 2017/10/06 20:54 by GeorgLink