Literal Japanese Translation

Note: this system was created as a Bachelor's project and should not be viewed as a production-ready library

About

Literal Japanese is a Japanese-English translator that translates word by word and thus preserves Japanese grammar. Literal Japanese is intended as a tool for Japanese learners who want to learn about Japanese sentence structure. Examples of the type of translations this program produces can be seem in the sentences_dev and sentences_test documents in the data folder. More in depth information can be read in the report written about the project: https://github.com/nikolajkb/LiteralJapaneseMirror/blob/master/data/Literal%20Japanese%20report.pdf

Installation

The system is developed for Python 3.6.8 and may not work for other versions.

run pip install . to install dependencies
install sudachipy dictionary using
pip install https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20191224.tar.gz
Run the NltkDownload.py in script folder to download nltk packages
Make sure that Visual C++ is installed (should only be necessary for Windows)
https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads

Usage

run LiteralJapanese.py
use the -h argument to display available commands
example command: LiteralJapanese.py --test "..\data\sentences_dev.txt" -v
must be run as administrator on Windows

The first run will take more time, as the program will generate necessary files.

Translating

Translations can be done interactively in the command line:
LiteralJapanese.py --interactive
A translation can also be saved to a specified file:
--translate "勉強すればするほど分かる。" --output "translation.txt"
This will write the translated tokens one line at a time, first the Japanese then the English separated by a tap (\t) character.
A file containing one or more sentences can also be specified:
--batch-translate --input "input.txt" --output "output.txt"

(will append to output file if it already exists)

Using python

Translations can be done programmatically by calling the translate(text) function located in LiteralJapanese.py. This will return a list of Translation objects that have three attributes.

japanese (the Japanese token)
english (the English translation of the token)
token (a token object with info on POS etc., refer to Tokenizer.py)

Testing

To test the system, you need to provide a test file. Two test files are provided in the data folder.
Example command:
LiteralJapanese.py --test "..\data\sentences_dev.txt" -v
You can also test only the tokenization using the --tt command.
Using the -v argument prints the gold and system translations.

The system can count translations as correct if they are synonyms of the gold translation using the -p argument. This requires two additional files.

Google news vectors
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing (from https://code.google.com/archive/p/word2vec/)
must be located in data/GoogleNewsVectors/GoogleNews-vectors-negative300.bin
Paraphrase database
http://paraphrase.org/#/download (small version)
must be located in data/PPDB/ppdb-2.0-s-all

Test file format

A test file consists of Sentences. Each sentence has four elements.

A numeric id for the sentence, prefixed by #
The Japanese sentence, prefixed by #jp
A natural English translation of the sentence, prefixed by #en
A tokenized version of the Japanese sentence. One token per line, with an English translation of the token on the same line separated by a tap character.

There is one empty line between each sentence.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

Literal Japanese Translation

About

Installation

Usage

Translating

Using python

Testing

Test file format

About

Releases

Packages

Contributors 2

Languages

nikolajkb/LiteralJapanese

Folders and files

Latest commit

History

Repository files navigation

Literal Japanese Translation

About

Installation

Usage

Translating

Using python

Testing

Test file format

About

Resources

Stars

Watchers

Forks

Languages