Skip to content

wdunicornpro/GithubCrossRepositoryTeams

Repository files navigation

This is the dataset and scripts of "Investigating the Cross-Repository Socially Connected Teams in Github"
This dataset is uploaded anonymously for sharing with the reviewers during the double-blind peer review process. The dataset may be opened publicly after acceptance.





File structure of this project:
	issuecomment\										-- IssueCommentEvent data from 01/01/2015 to 06/30/2018 organized in years, months, and days
	OSLOM\
		Edges.dat 										-- The edge list of the developer network
		Edges.dat_oslo_files\
			tp 											-- The list of modules generated by OSLOM 
			pajek_file_0.net 							-- The pajek file generated by OSLOM
			pajek_file_0_new_without_singleton.net 		-- The pajek file for visualization (Top 100 largest teams)
		Edges.link										-- The repo context of each edge in the developer network
		Edges_single.dat 								-- The edge list of the developer network(without cross-repo condition)
		Edges_single.dat_oslo_files\
			tp 											-- The list of modules generated by OSLOM			
		Edges_single.link 								-- The repo context of each edge in the developer network(without cross-repo condition)
		network.json									-- The developer network
		network_time.dat 								-- The duration of each edge in the developer network
		repo_statistics.txt								-- Repo level statistics
		team_tags.txt									-- Team level statistics
		team_tags_single.txt							-- Team level statistics (without cross-repo condition)
		teams.txt										-- Team list
		teams_single.txt								-- Team list (without cross-repo condition)
	README						-- This file
	contributors.json			-- Contributors of each repo
	contributors.py 			-- Script for generating contributors.json
	edgelist.py 				-- Script for generating OSLOM\Edges.dat and OSLOM\Edges.link
	edgelist_single.py 			-- Script for generating OSLOM\Edges_single.py and OSLOM\Edges_single.link
	network.py   				-- Script for generating OSLOM\network.json and OSLOM\network_time.dat
	pajek_repaint.py 			-- Script for generating OSLOM\Edges.dat_oslo_files\pajek_file_0_new_without_singleton.net
	repo_features.json			-- Numeric features of each repo
	repo_features.py 			-- Script for generating repo_features.json
	repo_language.json			-- Programming language of each repo
	repo_language.py 			-- Script for generating repo_language.py
	repo_statistics.py   		-- Script for generating OSLOM\repo_statistics.txt
	repo_topics.json 			-- Topics of each repo
	repo_topics.py 				-- Script for generating repo_topics.json
	repos.json 					-- List of repos
	repos.py 					-- Script for generating repos.json
	team_statistics.py 			-- Script for generating charts
	team_tags.py 				-- Script for generating OSLOM\team_tags.txt and OSLOM\team_tags_single.txt
	teams.py 					-- Script for generating OSLOM\teams.txt and OSLOM\teams_single.txt
	users.json 					-- List of users
	users.py 					-- Script for generating users.json







Workflow:
1. All public IssueCommentEvents from 01/01/2015 to 06/30/2018 stored under \issuecomment\ directory
2. Extract all the active repos during the above period by running:
	python repos.py
3. Get all the contributors of these repos through Github API by running:
	python contributors.py
4. Get the user list by running:
	python users.py
5. Get all repo languages, topics, and numeric features through Github API by running:
	python repo_features.py
6. Generate the developer network by running:
	python network.py OSLOM\network.json OSLOM\network_time.dat
7. Generate the edge lists by running:
	python edgelist.py OSLOM\network.json OSLOM\Edges.dat OSLOM\Edges_single.dat
8. Run OSLOM2(www.oslom.org) on the edge lists.
	./oslom_undir -f Edges.dat -uw -hr 0 -singlet -louvain 1 -t 0.99 -cp 0.01
9. Extract the team lists:
	python teams.py OSLOM\Edges.dat_oslo_files\tp OSLOM\network.json OSLOM\teams.txt
	python teams.py OSLOM\Edges_single.dat_oslo_files\tp OSLOM\network.json OSLOM\teams_single.txt
10. Compute lifetime for repos:
	python repo_time.py repos.txt repo_time.txt
11.Compute properties of each team:
	python team_tags.py OSLOM\Edges.link OSLOM\teams.txt OSLOM\Edges_single.link OSLOM\teams_single.txt repo_features.json OSLOM\network_time.dat contributors.json repo_time.txt OSLOM\team_tags.txt
12.Compute team level statistics and generate charts:
	python team_statistics.py OSLOM\team_tags.txt repo_features.json
13.Compute repo level statistics and generate charts:
	python repo_statistics.py OSLOM\repo_statistics.txt
14.
	python repo_feature_statistics.py OSLOM\repo_feature_statistics.txt
14.Generate the visualization of top 100 largest teams:
	python pajek_repaint.py OSLOM\Edges.dat_oslo_files\pajek_file_0.net
15.Open pajek_file_0_new_without_singleton.net in gephi(www.gephi.org)