Working on glossarys (dictionary databases) using python. Including editing glossarys and converting theme between many formats. The support matrix is,
Format | Extension | Read | Write |
---|---|---|---|
ABBYY Lingvo DSL |
|
|
|
AppleDict Source |
|
|
|
Babylon |
|
|
|
Babylon Source |
|
|
|
DictionaryForMIDs |
|
||
DICTD dictionary server |
|
|
|
FreeDict |
|
|
|
Gettext Source |
|
|
|
SQLite |
|
|
|
Octopus MDic |
|
|
|
Octopus MDic Source |
|
|
|
Omnidic |
|
|
|
PMD |
|
|
|
Sdictionary Binary |
|
|
|
Sdictionary Source |
|
|
|
SQL |
|
||
StarDict |
|
|
|
Tabfile |
|
|
|
TreeDict |
|
||
XDXF |
|
|
|
xFarDic |
|
|
|
BeautifulSoup4(with html5lib as backend) required to sanitize html contents.
sudo easy_install beautifulsoup4 html5lib
- GNU make as part of Command Line Tools for Xcode.
- Dictionary Development Kit as part of Auxillary Tools for Xcode. Extract to
/Developer/Extras/Dictionary Development Kit
Let's assume the Babylon dict is at ~/Documents/Duden_Synonym/Duden_Synonym.BGL
:
cd ~/Documents/Duden_Synonym/
python ~/Software/pyglossary/pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict Duden_Synonym.BGL Duden_Synonym.xml
make
make install
Launch Dictionary.app and test.
Let's assume the MDict dict is at ~/Documents/Duden-Oxford/Duden-Oxford DEED ver.20110408.mdx
.
- Use GetDict to extract Mdict dictionary (.mdx). Choose "UTF-8 TXT" output format and
Duden-Oxford DEED ver.20110408.mtxt
output file name. Run the following command:
cd ~/Documents/Duden-Oxford/ python ~/Software/pyglossary/pyglossary.pyw "Duden-Oxford DEED ver.20110408.mtxt" "Duden-Oxford DEED ver.20110408.xml" make make install
Launch Dictionary.app and test.
Let's assume the MDict dict is at ~/Downloads/oald8/oald8.mdx
, along with the image/audio resources file oald8.mdd
.
Run the following commands: :
cd ~/Downloads/oald8/
python ~/Software/pyglossary/pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict oald8.mdx oald8.xml
This extracts dictionary into oald8.xml
and data resources into folder OtherResources
. Hyperlinks use relative path. :
sed -i "" 's:src="/:src=":g' oald8.xml
Convert audio file from SPX format to WAV format. You need package speex
from MacPorts :
find OtherResources -name "*.spx" -execdir sh -c 'spx={};speexdec $spx ${spx%.*}.wav' \;
sed -i "" 's|sound://\([/_a-zA-Z0-9]*\).spx|\1.wav|g' oald8.xml
But be warned that the decoded WAVE audio can assume ~5 times more disk space!
Compile and install. :
make
make install
Launch Dictionary.app and test.