GitDataSCP

Intro

This dataset is a subset of GHTorrent dataset.


Collections

Collection : Users

Collection : Repos

Collection : Issues

Collection : Issue comments

Collection : Commits

Collection : Commit comments

Collection : Pull requests

Collection : Pull request comments

Collection : Forks

Collection : Watchers

Collection : Repo collaborators

ALL dataset (with selected fields) : GitDataSCP

ALL dataset (with full mongodb dump) : GitDataSCP mongodb archive

If you want to use it in your work, please cite the following paper: link.

Bibtex:

@article{SEKER2020,
author = {ŞEKER, Abdulkadir and DİRİ, Banu and ARSLAN, Halil and AMASYALI, Fatih},
doi = {10.17776/csj.728932},
issn = {2587-2680},
journal = {Cumhuriyet Science Journal},
keywords = {Big data,Ghtorrent,GitHub,MongoDB},
month = {sep},
number = {3},
pages = {720--724},
title = {{Summarising big data: public GitHub dataset for software engineering challenges}},
url = {https://dergipark.org.tr/en/doi/10.17776/csj.728932},
volume = {41},
year = {2020}
}