See also the related blog post for more information!
- dulwich (Docs)
- swig
- gpgme
- matplotlib
- python-gpg:
pip3 install --user gpg
To get them quickly and test that everything works:
guix environment -l guix.scm
for i in *.py; do python3 $i --test; done
Apache Public License 2.0. See COPYING.
./retrieve_commits_and_issues.py [--with-files] [--output TODO_FILE.todo] [--previous OLD_TODO_FILE.todo] PATH_TO_GIT_REPO ...
commit-issue pairs included in the OLD_TODO_FILE are not added to the TODO_FILE.
./retrieve_repository_info.py [--output INFO_FILE.repoinfo] PATH_TO_GIT_REPO
./link_commits_to_issues.py [--create-the-links] [--jira-api-server URL] [--netrc-gpg-path jira-netrc.gpg | --jira-user USER --jira-password PASSWORD] --repo-info-file FILE.repoinfo FILE.todo
prepare netrc:
echo machine jira.HOST.TLD login USER password PASSWORD | gpg2 -er MY_EMAIL@HOST.TLD > jira-netrc.gpg
<commit> <issue> <isodate> <message with linebreaks replaced by "---" >
There can be multiple entries per commit: one per issue referenced.
The entries are ordered in commit_time order: newest commits first (they are the most important ones to have right).
./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue | sort > files-affected-by-time-with-issue.dat
./plot.py files-affected-by-time-with-issue.dat
# ...
# get all jira bugs:
# ./find_all_bugs.py --jira-api-server https://jira.HOST.TLD > all-bugs.log
# stats
./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue -i all-bugs.log | sort > files-affected-by-time-with-issue-only-bugs.dat
./plot.py files-affected-by-time-with-issue-only-bugs.dat
./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --sum-filesizes-per-issue | sort > sum-filesize-by-time-with-issue.dat
./plot.py sum-filesize-by-time-with-issue.dat
./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist all-issues-nodelist-max300.csv
Analyze the CSVs with graph software like Gephi.
With the example of MODULE_FOO, runtime of a few hours in a 1 million line codebase.
This needs ripgrep in addition to the other dependencies.
./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist all-issues-nodelist-max300.csv
grep MODULE_FOO all-issues-nodelist-max300.csv > all-issues-nodelist-max300-foo.csv
cat all-issues-nodelist-max300-foo.csv | cut -d " " -f 1 > foo-nodeids-raw.txt
time grep -wf foo-nodeids-raw.txt all-issues-edgelist-max300.csv | tee all-issues-edgelist-max300-with-foo.csv
sed s/^/^/ foo-nodeids-raw.txt > foo-nodeids-first.txt
sed "s/^/ /" foo-nodeids-raw.txt | sed "s/$/ /" > foo-nodeids-second.txt
time rg -f foo-nodeids-second.txt all-issues-edgelist-max300-with-foo.csv | tee all-issues-edgelist-max300-to-foo.csv
time rg -f foo-nodeids-first.txt all-issues-edgelist-max300-to-foo.csv | tee all-issues-edgelist-max300-from-foo.csv
Now import all-issues-nodelist-max300-foo.csv
and all-issues-edgelist-max300-from-foo.csv
into Gephi.
Just change the logfile from retrieve_commits_and_issues.py
and select the lines you want. It is ordered by time, newest issue first.