User Tools

Site Tools


chaoss:gsoc-ideas

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
chaoss:gsoc-ideas [2018/01/23 16:08]
GeorgLink
chaoss:gsoc-ideas [2018/03/21 15:56] (current)
GeorgLink
Line 1: Line 1:
-====== Ideas for Google Summer of Code projects ======+[[chaoss:​start|<<​ Back to CHAOSS start page]] ]
  
 +{{:​chaoss:​chaoss_logo_pantone_2_.png?​400|}}
 +====== Ideas for Google Summer of Code projects ======
  
-=====  Idea #1: Reporting ​of CHAOSS ​Metrics ​=====+===== Idea #1: Support ​of Standard ​CHAOSS ​Formats for Description of Projects ​=====
  
 +[ [[https://​github.com/​chaoss/​grimoirelab/​issues/​71|Micro-tasks and place for questions]] ]
  
-Currently, [[https://grimoirelab.github.io/|GrimoireLab]] includes a tool for reporting: Manuscripts. This tool reads data from a GrimoireLab ElasticSearch database, and produces with it a PDF  +Currently, GrimoireLab uses its own format for describing a project, including the data sources (repositories to retrieve information from), the internal organization of the project (e.g., in subprojects),​ and specifics about how the data is to be presented. For this information,​ some standard formats already exist, that can be directly used, or used with some modifications. Among them, [[https://en.wikipedia.org/wiki/DOAP|DOAP]] is one of the most interesting onesbut there are many others.
-report with some relevant metrics for a set of analyzed projects. Internally, Manuscripts uses some Python code to produce charts and CSV tables, which are later integrated into a LaTeX document to produce ​the final PDF. Other approachessuch as producing Jupyter notebooks, will  +
-be explored too.+
  
-This idea is about adding support to Manuscripts to produce reports based on the work of the CHAOSS Metrics TCSince Manuscripts is still a moving targetthis will be also a chance to participate in the general development of the tool itself, to convert it into a generic reporting system for GrimoireLab data.+This idea is about identifying formats used by projects to describe themselves and adding support to GrimoireLabThis includes not only static formatsbut also APIs.
  
 The aims of the project are as follows: ​ The aims of the project are as follows: ​
  
-  * Writing Python code to query GrimoireLab ​Elastisearch databases and obtain from it the metrics relevant for the report. Maybe Python Pandas will be used to help in this task+  * Supporting DOAP, by either converting it to the current ​GrimoireLab ​format, or more likely, directly supporting ​it in Mordred or some related tools
-  * Writing Python code to produce suitable representation for those metrics, such as tables and charts+  * Testing the implementation with large projects supporting DOAP, such as Apache Server
-  * Adapting current tools to produce reports directly from data sourcesby managing the GrimoireLab toolchainVery likely this task will be done by adding ​the needed code to Mordredthe tool orchestrating GrimoireLab ​tools.+  * Exploring other formats used to express information related to projects (repositoriespeople, affiliations)This may include specific formats used by some large projects (such as Eclipse or OpenStack), affiliation and unique identities formats (such as the gitdm format). 
 +  * Exploring other APIs used to express information related to projectssuch as those provided by software development forges (e.g., GitLab, BitBucket), or tools supporting software development.
  
-Other tasks, such as producing Jupyter notebooks as a final result or +The aims may require modifications to Mordred and other related tools 
-an intermediate step are completely within scope.+to make them modular and simplify the implementation of support for 
 +future formats or APIs.
  
   * //​Difficulty://​ easy/medium   * //​Difficulty://​ easy/medium
-  * //​Requirements://​ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. +  * //​Requirements://​ Python programming. Willingness to understand GrimoireLab internals. 
-  * //​Recommended://​ Experience with Python ​interfaces to databases ​would be convenient, but can be learned during the project. Experience with Latex and/or Python Jupyter Notebooks would help+  * //​Recommended://​ Experience with Python ​HTTP and XML libraries ​would be convenient, but can be learned during the project. 
-  * //​Mentors://​ Jesus M. Gonzalez-Barahona, ​Matt Germonprez+  * //​Mentors://​ Jesus M. Gonzalez-Barahona, ​Valerio Cosentino
  
-===== Idea #2: Support of Standard CHAOSS Formats for Description of Projects ===== 
  
  
-Currently GrimoireLab uses its own format for describing a project, including the data sources (repositories to retrieve information from), the internal organization ​of the project (eg, in subprojects),​ and specifics about how the data is to be presented. For some of these information,​ standard formats already exist, that can be directly used, or used with some modifications. Among them, DOAP is one of the most interesting one, but there aremany others.+=====  Idea #2: Reporting ​of CHAOSS Metrics =====
  
-This idea is about identifying formats used by projects ​to describe +[ [[https://​github.com/​chaoss/​grimoirelab/​issues/​70|Micro-tasks and place for questions]] ] 
-themselves, and adding support to GrimoireLab ​of themThis includes + 
-not only static formatsbut also APIs.+Currently, [[https://​grimoirelab.github.io/​|GrimoireLab]] includes a tool for reporting: Manuscripts. ​This tool reads data from a GrimoireLab ElasticSearch database, and produces with it a PDF report with relevant metrics for a set of analyzed ​projects. InternallyManuscripts uses some Python code to produce charts ​and CSV tables, which are integrated into a LaTeX document to produce the final PDF. Other approaches, such as producing Jupyter notebooks, will be explored too. 
 + 
 +This idea is about adding support to Manuscripts to produce reports based on the work of the CHAOSS CommunitySince Manuscripts is still a moving targetthis will be also a chance to participate in the general development of the tool itself, to convert it into a generic reporting system for GrimoireLab data.
  
 The aims of the project are as follows: ​ The aims of the project are as follows: ​
  
-  * Supporting DOAP, by either converting it to the current ​GrimoireLab ​format, or more likely, directly supporting ​it in Mordred or some related tools+  * Writing Python code to query GrimoireLab ​Elastisearch databases and obtain from it the metrics relevant for the report. Possible technologies to achieve this aim include Python Pandas
-  * Testing the implementation with large projects supporting DOAP, such as Apache+  * Writing Python code to produce suitable representation for those metrics, such as tables and charts
-  * Exploring other formats used to express information related to projects (repositoriespeople, affiliations,​ etc.). This may include ​specific formats used by some large projects (such as Eclipse or OpenStack), affiliation and unique identities formats (such as the gitdm format), etc. +  * Adapting current tools to produce reports directly from data sourcesby managing the GrimoireLab toolchainPossible solutions ​include ​adding ​the code to Mordredthe tool orchestrating GrimoireLab ​tools.
-  * Exploring other APIs used to express information related to projectssuch as those provided by Software Development Forges (GitLab, BitBucket, for example), or tools supporting software development.+
  
-The task may include modifications to Mordred and other related tools +Other aimssuch as producing Jupyter notebooks as a final result ​or an intermediate step are completely within scope.
-to make them modularand simplify the implementation of support for +
-future formats ​or APIs.+
  
   * //​Difficulty://​ easy/medium   * //​Difficulty://​ easy/medium
-  * //​Requirements://​ Python programming. Willingness to understand GrimoireLab internals. +  * //​Requirements://​ Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals. 
-  * //​Recommended://​ Experience with Python ​HTTP and XML libraries ​would be convenient, but can be learned during the project. +  * //​Recommended://​ Experience with Python ​interfaces to databases ​would be convenient, but can be learned during the project. Experience with Latex and/or Python Jupyter Notebooks would help
-  * //​Mentors://​ Jesus M. Gonzalez-Barahona, ​Valerio Cosentino+  * //​Mentors://​ Jesus M. Gonzalez-Barahona, ​Matt Germonprez, Jordi Cabot 
  
  
 ===== Idea #3: Prototype New CHAOSS Metrics ​ ===== ===== Idea #3: Prototype New CHAOSS Metrics ​ =====
  
 +[ [[https://​github.com/​OSSHealth/​ghdata/​issues/​82|Micro-tasks and place for questions]] ]
  
-Library ​that can be used by CHAOSS Community Software projects like GHData to express open source software project level similarities. There are two components: A set of algorithms for integrating similarity measures on an array of project data and implementation of visualizations using our existing framework and possibly adding to the framework. ​+Create a library ​that can be used by CHAOSS Community Software projects like GHData to express open source software project level similarities. There are two components: A set of algorithms for integrating similarity measures on an array of project data and implementation of visualizations using our existing framework and possibly adding to the framework. ​
  
  
 The aims of the project are as follows: ​ The aims of the project are as follows: ​
-  - Build new metric ​in a Python/​Flask/​MetricsJS open source project ​called ​[[http://​www.github.com/​OSSHealth/​ghdata|"GHData"]]. This will familiarize our summer of code participant ​with different metrics as currently defined by the Linux Foundation ​CHAOSS ​Project, as well as introduce ​them to the primary ​user interaction design goals of :  +  - Build new metrics ​in a Python/​Flask/​MetricsJS ​for the open source project [[http://​www.github.com/​OSSHealth/​ghdata|GHData]]. This will create familiarity ​with different metrics as currently defined by the CHAOSS ​project, as well as introduce user interaction design goals of:  
-    - Enabling comparisons between GitHub, Mozilla and other open source project repositories and projects as a default design mechanism +    - Enabling comparisons between GitHub, Mozillaand other open source project repositories and projects as a default design mechanism. 
-    - Considering the different ways of building software to do temporal comparisons +    - Considering the different ways of building software to do temporal comparisons. 
-  - Build machine learning algorithms that identify candidate “toxic interactions” in open source mailing lists and IRC channels, with the aim of making open source a more welcoming environment for diverse populations +  - Build machine learning algorithms that identify candidate “toxic interactions” in open source mailing lists and IRC channels, with the aim of making open source a more welcoming environment for diverse populations. 
-  - Design and evaluate exploratory mechanisms for presenting project data, metrics and analysis using a complex, hierarchical and networked set of data structures. ​ For example, there are main ways a commit” is defined in open source software: a) The explicit, individual ​commit” record and b) unique commits. For each of these metrics, which can be reasonably ​easily ​calculated from source repositories,​ there are interests in CHOASS project stakeholders in understanding them: +  - Design and evaluate exploratory mechanisms for presenting project data, metricsand analysis using a complex, hierarchicaland networked set of data structures. ​ For example, there are two main ways a "commit" ​is defined in open source software: a) The explicit, individual ​"commit" ​record and b) "unique commits". For each of these metrics, which can be reasonably calculated from source repositories,​ there are interests in CHOASS project stakeholders in understanding them: 
     - By project     - By project
     - Project organization     - Project organization
Line 69: Line 72:
     - Individual     - Individual
     - Corporate organization     - Corporate organization
-    - Roles in a project (people evolving from the periphery to the core, for example).+    - Roles in a project (including ​people evolving from the periphery to the core).
  
 Each of these are significant opportunities for a Google Summer of Code participant to engage and learn and become part of a project. ​   Each of these are significant opportunities for a Google Summer of Code participant to engage and learn and become part of a project. ​  
Line 77: Line 80:
   * //​Requirements://​ Python programming. Networking Basics, JavaScript Basics   * //​Requirements://​ Python programming. Networking Basics, JavaScript Basics
   * //​Recommended://​ Experience with Python HTTP and XML libraries would be convenient, but can be learned during the project.   * //​Recommended://​ Experience with Python HTTP and XML libraries would be convenient, but can be learned during the project.
-  * //​Mentors://​ Sean Goggins, Jesus M. Gonzalez-Barahona+  * //​Mentors://​ Sean Goggins, Jesus M. Gonzalez-Barahona, Josianne Marsan 
 + 
 + 
chaoss/gsoc-ideas.1516723687.txt.gz · Last modified: 2018/01/23 16:08 by GeorgLink