RepoQuester is a command-line tool to assist developers in evaluating projects by providing quantitative scores for ten metrics that span project collaboration, quality and maintenance.
We implemented RepoQuester by forking and modifying the Repo Reaper tool. Reaper needs the GHTorrent database's queryable offline mirror which contains information about millions of GitHub repositories. To run the Reaper tool, users have to initially download the database offline mirror~(more than 100GB) onto their local machines. Reaper then queries the SQL tables in the GHTorrent database for obtaining meta-information about a repository. To update the Reaper's dataset one has to re-download the GHTorrent's latest dump and reanalyze the repositories. Reaper analyzes a project by calculating values for eight metrics which involves joining various SQL tables in GHTorrent. RepoQuester however does not rely on GHTorrent.
Unlike RepoReaper, the RepoQuester can be used on a local machine to analyze any number of projects by providing the project's URL and programming language as input either on the command line or inside a configuration file. Below here is the architecture of RepoQuester
We bring forward a few use case scenarios where RepoQuester can be used. Consider the following four scenarios where one would want a good quality project:
-
Consider Arjun, an undergraduate who wanted to build a small customized Java Script framework for his project. He found a set of GitHub projects to fork and customize. He needed to select one project among them; he needed to understand whether a project is well-documented and well-tested and whether the project's library dependencies are upgraded to the latest.
-
Chaitra, a software developer, wants to contribute to open-source projects with an active community around them, licensed, and using continuous integration services.
-
Consider Jay, a graduate working on evaluating a set of Ruby projects on GitHub to utilize them as libraries. He might want to know whether the library is active or deprecated, well-tested, and less prone to issues.
-
Dhruva and their team are building an open-source cryptocurrency project. They wanted to follow best practices during the project's development, such as writing test files, writing code comments, implementing continuous integration services, and so on; however, they are finding it difficult to assess whether their project has improved over time. Their goal is to keep track of different project dimensions that can improve their project usability and maintenance.
As described in the above four scenarios, developers often need to evaluate which project on GitHub to interact with, and not all relevant information is readily available. And RepoQuester can help automate the process of mining GitHub projects by providing scores for metrics (that span project collaboration, quality, and maintenance) for the projects being considered.
In order to improve our tool, we wish to collect developer feedback on RepoQuester's metrics through surveys. Metrics such as "Pull Requests Ratio", "Releases", "Continuous Integration" could be coupled with statistical analysis and more advanced methods in order to derive more insights. Using RepoQuester at the core, individual developers/researchers can build recommender systems to suggest projects to users for search categories such as library reuse, documentation quality, community support, ownership and licensing of a repository, continuous integration services, release frequency, library re-implementation and so on
Download this GitHub repository onto your local machine
git clone https://github.com/Kowndinya2000/Repoquester
- Install python libraries
pip install -r requirements.txt
- Download
cloc
https://github.com/AlDanial/cloc/tree/1.88#install-via-package-manager
Example: For Debian or Ubuntu OS: sudo apt install cloc
- Download
Ack
https://beyondgrep.com/install/
Example: For Debian or Ubuntu OS: sudo apt install ack-grep
- Open the file
repo_urls
Add username/repositoryname in newlines.
A sample of 265 repository urls are present already in the file.
- Open the file
tokens.py
Alteast provide one Github Personal Access Token.
Format to provide token can be viewed in the file.
- Initialize the database
chmod +x *sh
sed -i -e 's/\r$//' initialize.sh
./initialize.sh
- Run the script to analyze the repositories
./run.sh
- Check the results in the database file
repo_quester.db
- To re-run the analysis without modyfing repository information
chmod +x *sh
./clean.sh
./run.sh
- To empty the repository information and results.
chmod +x *sh
./empty.sh
(This also deletes the database file. Only retains the usable tool template)
Follow the steps 1-4 again.
- To run a particular repository.
For example, to analyze repository with repo_id = 2 : run the below two commands
chmod +x script2.sh
./script2.sh
Open the file
repo_quester.db
Database file could be viewed in DB Browser for SQLite (download link: https://sqlitebrowser.org/)
Table:
repoquester_results
- To select a repository "Microsoft/IEDiagnosticsAdapter" use the below command:
SELECT * FROM repoquester_results WHERE repository in ("Microsoft/IEDiagnosticsAdapter");
- To select a set of metrics for the repository "Microsoft/IEDiagnosticsAdapter"
For example, to select metrics: community, continuous_integration and license use the below command:
SELECT community,continuous_integration,license FROM repoquester_results WHERE repository in ("Microsoft/IEDiagnosticsAdapter");
- The results table can be exported to
CSV
,JSON
andsql
file formats
In the DB Browser for SQLite application:
Click on File->Export-> Database to SQL file
-> Table(s) as CSV file
-> Table(s) to JSON
Please find a demo video here
For more information about the project and support requests, feel free to contact Kowndinya Boyalakuntla (cs17b032@iittp.ac.in). Please open an issue or pull request if you find any bug or have an idea for enhancement.