This repo contains a copy/fork of git://gitorious.org/git_fast_filter/mainline.git wit addition of a set of .py scripts used to perform the migration.
See FAQ.md for questions concerning the migration.
-
git clone mirror repo to get all branches lcoal and use the readonly url to avoid being able to mirror back.
$ git clone git://github.com/jbosstools/jbosstools-svn-mirror.git $ cd jbosstools-svn-mirro && git pull --all $ git branch -m trunk master
-
Backup local repo for reuse later
$ tar -czvf jbosstools-svn-mirror.tar.gz jbosstools-svn-mirror
To get a clean copy:
$ tar -xvf jbosstools-svn-mirror.tar.gz
-
Enter the new copy folder, and clean out all but the content we care about
$ cd jbosstools-svn-mirror
$ git pull
$ git branch -a -r | grep -E "jbpm-jpdl-4.0.0.beta2|https|origin/jbosstools-4.0.0.Alpha1|tags/jbosstools-3.0.x|3.2.helios|3.3.indigo|3.1.0.M3|vpe-spring|xulrunner-1.9.2.16|hibernatetools|dead|smooks|tycho_exp|modular_build" | xargs git branch -r -D
$ git branch -a -r | grep -vE "GA" | grep -v ".x" | grep -vi "final" | grep -v "jbosstools-4" | grep -v jbpm-jpdl | grep -v trunk | xargs git branch -r -D
-
Checkout all branches locally so not just tracked remotely:
$ git branch -a | sed -e "s/remotes.origin///g" | xargs -I {} git branch {} remotes/origin/{}
-
Convert branch svn tags into real tags
$ git branch | grep tags | sed -e "s/tags///g" | xargs -n 1 -I {} git tag -m "svn branch tag" {} tags/{}
$ git branch | grep tags | xargs -n 1 git branch -d
-
Remove reference to the origin remote
$ git remote rm origin
-
setup git_fast_filter, GitHub credentials etc.
$ cd .. $ source setup.sh
-
Run the split/filter of repositories (this requires alot of disk space)
$ ./filter-repos.sh
Each repo is done by running filter_repo.py like this:
$ python filter_repo.py jbosstools-svn-mirror jbosstools-base "^common.*|^tests.*|^runtime.*|^usage.*"
And then master is checked out, garbage collected, removed empty commits, each repo with just one subdir is filter-branched to have the subdir as root and finally a second garbage collection.
-
Delete big files (maybe filter that earlier ?)
TBD
git verify-pack -v .git/objects/pack/pack-*.idx | grep blob | sort -k3nr | head | while read s x b x; do git rev-list --all --objects | grep $s | awk '{print "'"$b"'",$0;}'; done
-
Create github repos
$ find jbosstools-* -maxdepth 0 | xargs -n 1 -I {} createrepo.sh {}
$ find jbosstools-* -maxdepth 0 | xargs -n 1 -I {} curl -u "maxandersen:$GITHUBPWD" https://api.github.com/$GITHUB_ROOT/scratch-{} -X DELETE
Resources used:
- GitHub API - GitHub REST API allowed me to setup and destroy multiple repositories very easily. Not having to click through the web ui safed me a lot of time.
- git_fast_filter - Git Fast Filter is several magnitudes faster than using git filter-branch. Highly recommended for splitting up a git repository.
- Atlassian SVN to Git Migration - Page describing how Atlassian migrated by using a svn mirror, sync and git svn fetch.
- Using tmpfs with filter_branch - If you have to use filter-branch then use it together with a memory mapped filesystem for speed reasons
- ramdisk for OSX - Scripts to create a memory mapped filesystem on OSX
- Clean out empty commits - empty commits occur often when commits has no file in them because of the filter or if changes only relate to svn props. Makes the history messy.
- Detach subdirectory into separate git repository -