Skip to content

Relink git commits in jira for the migration from SVN to git and gitlab.

License

Notifications You must be signed in to change notification settings

DisyInformationssysteme/git-to-jira-links

Repository files navigation

relink git issues to jira

See also the related blog post for more information!

requirements

  • dulwich (Docs)
  • swig
  • gpgme
  • matplotlib
  • python-gpg: pip3 install --user gpg

To get them quickly and test that everything works:

guix environment -l guix.scm
for i in *.py; do python3 $i --test; done

License

Apache Public License 2.0. See COPYING.

usage

Retrieve commits and issue-IDs from Git repo

./retrieve_commits_and_issues.py [--with-files] [--output TODO_FILE.todo] [--previous OLD_TODO_FILE.todo] PATH_TO_GIT_REPO ...

commit-issue pairs included in the OLD_TODO_FILE are not added to the TODO_FILE.

Store repository info

./retrieve_repository_info.py [--output INFO_FILE.repoinfo] PATH_TO_GIT_REPO

Link commits to Jira

./link_commits_to_issues.py [--create-the-links] [--jira-api-server URL] [--netrc-gpg-path jira-netrc.gpg | --jira-user USER --jira-password PASSWORD] --repo-info-file FILE.repoinfo FILE.todo

credentials via netrc

prepare netrc:

echo machine jira.HOST.TLD login USER password PASSWORD | gpg2 -er MY_EMAIL@HOST.TLD > jira-netrc.gpg

todo file format

<commit> <issue> <isodate> <message with linebreaks replaced by "---" >

There can be multiple entries per commit: one per issue referenced.

The entries are ordered in commit_time order: newest commits first (they are the most important ones to have right).

History analysis

Files affected per issue

./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue | sort > files-affected-by-time-with-issue.dat
./plot.py files-affected-by-time-with-issue.dat

Only bugs

# ...
# get all jira bugs:
# ./find_all_bugs.py --jira-api-server https://jira.HOST.TLD > all-bugs.log
# stats
./retrieve_commits_and_issues.py --with-files --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --count-files-per-issue  -i all-bugs.log | sort > files-affected-by-time-with-issue-only-bugs.dat
./plot.py files-affected-by-time-with-issue-only-bugs.dat

Aggregated file size of changed files per issue

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py issues-and-files.log --sum-filesizes-per-issue | sort > sum-filesize-by-time-with-issue.dat
./plot.py sum-filesize-by-time-with-issue.dat

Create nodelists and edgelists

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist  all-issues-nodelist-max300.csv

Analyze the CSVs with graph software like Gephi.

Subselect a graph for a specific module

With the example of MODULE_FOO, runtime of a few hours in a 1 million line codebase.

This needs ripgrep in addition to the other dependencies.

./retrieve_commits_and_issues.py --with-files-and-sizes --output issues-and-files.log ./
./correlate_files_per_issue.py --file-connections issues-and-files.log --debug --output-edgelist all-issues-edgelist-max300.csv --output-nodelist  all-issues-nodelist-max300.csv
grep MODULE_FOO all-issues-nodelist-max300.csv > all-issues-nodelist-max300-foo.csv
cat all-issues-nodelist-max300-foo.csv | cut -d " " -f 1 > foo-nodeids-raw.txt
time grep -wf foo-nodeids-raw.txt all-issues-edgelist-max300.csv | tee all-issues-edgelist-max300-with-foo.csv
sed s/^/^/ foo-nodeids-raw.txt > foo-nodeids-first.txt
sed "s/^/ /" foo-nodeids-raw.txt | sed "s/$/ /" > foo-nodeids-second.txt
time rg -f foo-nodeids-second.txt all-issues-edgelist-max300-with-foo.csv | tee all-issues-edgelist-max300-to-foo.csv
time rg -f foo-nodeids-first.txt all-issues-edgelist-max300-to-foo.csv | tee all-issues-edgelist-max300-from-foo.csv

Now import all-issues-nodelist-max300-foo.csv and all-issues-edgelist-max300-from-foo.csv into Gephi.

Select a specific timespan

Just change the logfile from retrieve_commits_and_issues.py and select the lines you want. It is ordered by time, newest issue first.

About

Relink git commits in jira for the migration from SVN to git and gitlab.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published