This repository contains all work related to the 'Good First Issue' thesis report and research done by (Jan Willem) David Alderliesten. The thesis report was written for partial completion of the requirements of the Computer Science MSc program at Delft University of Technology.
The research performed aimed to analyze the rate of adoption and usefulness of issues on Github that are indicated as being good for beginners or new developers within an open (source) project. Many of these issues have been labelled as 'good first issue,' resulting in the name for both the thesis and the repository.
The following subsections outline the contents of this repository.
The sample repositories utilized in this repository have been sampled using methods outlined in the thesis report. An overview of both all sampled repositories, and only those employing 'good first issues,' can be found in the 'Sample Set' folder in this repository.
The code folder contains all code and scripts utilized for all components of this thesis. The majority of the codebase for the thesis was written in Python employing the PyCharm development environment created by JetBrains. The codebase relies on numerous libraries provided with most Python installations and a number of external dependencies. These pacakage dependencies are, in alphabetical ordering:
- CSV
- Github
Please note that the code provided is not indicative of my personal developer abilities or meant as a showcase of efficient code. The codebase was iteratively developed utilizing numerous frameworks and mismatching APIs, meaning it was constructed with the goal of 'getting the job done' as opposed to being efficient and optimal.
The raw data containing all sampled good first issues from a repository and the first commits per user can be found in the 'Data' folder. Each repository has two CSV files containing this information.
The analyzed components that are presented in the thesis work are given in the 'Analysis' folder. Within this folder, the template CSV and XLSX files utilized for the analysis can be found, along with the analysis per sampled repository found in the 'Data' folder. For each repository, one CSV was created containing up to 30 sampled first contributions from developers, whereas the other CSV file contains up to 30 sample good first issues.
Additionally, files are found containing all samples for simple sorting and analysis.
This work is licensed under a Creative Commons Attribution 4.0 International License.