A tool to migrate the content of a MoinMoin wiki to a Git backed wiki engine like Gollum, Realms, Waliki or similar.
git clone --recursive https://github.com/mgaitan/moin2git.git
[sudo] pip install -r requirements.txt
If you also want to convert each page to reStructuredPage format, (see --convert-to-rst
) you will need to install MoinMoin:
[sudo] pip install moin
tin@morochita:~$ python moin2git.py --help
moin2git.py
A tool to migrate the content of a MoinMoin wiki to a Git based system
like Waliki, Gollum or similar.
Usage:
moin2git.py migrate <data_dir> <git_repo> [--convert-to-rst] [--users-file <users_file>]
moin2git.py users <data_dir>
moin2git.py attachments <data_dir> <dest_dir>
Arguments:
data_dir Path where your MoinMoin content is
git_repo Path to the target repo (created if it doesn't exist)
dest_dir Path to copy attachments (created if it doesn't exist)
Options:
--convert-to-rst After migrate, convert to reStructuredText
--users-file Use users_file to map wiki user to git commit author
If you need to convert the markup to rst, you will need a working moinmoin instance. For a fast and dirty configuration, put your data in a directory named wiki
, and copy wikiconfig.py
in the same level:
wikiconfig.py
wiki/
├── data/
Then copy moin2git/moin2rst/text_x-rst.py
to wiki/data/plugins/formatters/
You may also need to copy the entire contents of /usr/share/moin
into wiki
as well.
MoinMoin is a wiki engine powered by Python that store its content (including pages, history of changes and users) as flat files under the directory /data
.
An overview of the structure of this tree is this:
data/
├── cache
│ │ ...
│
├── pages
│ │
│ ├── AdoptaUnNewbie
│ │ ├── cache
│ │ │ ├── hitcounts
│ │ │ ├── pagelinks
│ │ │ └── text_html
│ │ ├── current
│ │ ├── edit-lock
│ │ ├── edit-log
│ │ └── revisions
│ │ ├── 00000001
│ │ ├── 00000002
│ │
│ ├── AlejandroJCura
│ │ ├── cache
│ │ │ ├── pagelinks
│ │ │ └── text_html
│ │ ├── current
│ │ ├── edit-lock
│ │ ├── edit-log
│ │ └── revisions
│ │ ├── 00000001
│ │ ├── 00000002
│ │ └── 00000003
│ │
│ ├── AlejandroJCura(2f)ClassDec(c3b3)
│ │ ├── cache
│ │ │ ├── pagelinks
│ │ │ └── text_html
│ │ ├── current
│ │ ├── edit-lock
│ │ ├── edit-log
│ │ └── revisions
│ │ ├── 00000001
│ │ ├── 00000002
│ │ └── 00000003
...
│ └── YynubJakyfe
│ ├── edit-lock
│ └── edit-log
│
└── user
├── 1137591729.59.35593
├── 1137611536.06.62624
├── 1138297101.79.62731
├── 1138912320.61.21990
├── 1138912840.93.11353
...
- Each wiki page (no matter how deep its url be) is stored in a directory
/data/pages/<URL>
. For example in our example the url/AlejandroJCura/ClassDec%C3%B3
1 isdata/pages/AlejandroJCura(2f)ClassDec(c3b3)
- The content itself is in the directory
/revisions
, describing the history of a page. Each file in this directory is a full version of a the page (not a diff). The file
/data/pages/<URL>/current
works as a pointer to the current revision (in general, the more recent one, but a page could be "restored" to an older revision). For example:tin@morochita:~/lab/moin$ cat data/pages/AlejandroJCura/current 00000003
The
edit-log
file describes who, when and (if there is a log a message) why:tin@morochita:~/lab/moin$ cat data/pages/AlejandroJCura/edit-log 1141363609000000 00000001 SAVENEW AlejandroJCura 201.235.8.161 161-8-235-201.fibertel.com.ar 1140672427.37.17771 Una pagina para mi? 1155690306000000 00000002 SAVE AlejandroJCura 201.231.181.174 174-181-231-201.fibertel.com.ar 1140672427.37.17771 1218483772000000 00000003 SAVE AlejandroJCura 201.250.38.50 201-250-38-50.speedy.com.ar 1140672427.37.17771
The data logged is (in this order, separated by tabs):
EDITION_TIMESTAMP
,REVISION
,ACTION
,PAGE
,IP
,HOST
,USER_ID
,ATTACHMENTS
,LOG_MESSAGE
The
USER_ID
point to a file under the directory/data/user
contained a lot of information related to the user. For example:(preciosa)tin@morochita:~/lab/moin$ cat data/user/1140549890.71.33402 remember_me=1 theme_name=pyar editor_default=text show_page_trail=1 disabled=0 quicklinks[]=Noticias css_url= edit_rows=20 show_nonexist_qm=0 show_fancy_diff=1 tz_offset=-10800 subscribed_pages[]= aliasname= remember_last_visit=0 enc_password={SHA}5kXNi+HjaTCGItkg6yTPNRtSDGE= email=mautuc@yahoo(....) show_topbottom=0 editor_ui=freechoice datetime_fmt= want_trivial=0 last_saved=1219176737.74 wikiname_add_spaces=0 name=MauricioFerrari language= show_toolbar=1 edit_on_doubleclick=0 date_fmt= mailto_author=0 bookmarks{}=
moin2git.py
uses git (via the wonderful sh) to handle the history, so don't need multiples files to track differents revision of a page
For instance, in the root of our target directory (the git repo) we should get a file AlejandroJCura
:
- 3 revisions (commits), from
revisions/00000001
untilrevisions/00000003
- the author name/nickname and email (if available) is parsed from the user file of each revision. To know who and when made what version,
moin2git.py
parses theedit-log
file of each page.
We should also get a file AlejandroJCura/ClassDecó
2 where, in this case, AlejandroJCura/
is a directory.
The option --users-file acepts a file that will be used to map wiki users to git commit authors.
The output of the command moin2git.py users <data_dir>
can be used as input. For each users the required fields are name
and email
.
Note we should parse the ugly escaping.
(2f)
is/
and determines the left part is a directory.(c3b3)
means%C3%B3
, i.e.ó
↩