Skip to content

A script that visualizes alignments between two sentences.

Notifications You must be signed in to change notification settings

kaharjan/alignment-viewer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alignment-viewer

Introduction

This script visualizes alignments between two sentences.

Machine translation, paraphrasing systems often use alignments. Alignments can be automatically produced by systems such as GIZA++.

  • Wondering if alignment can be useful for your applications (e.g. natural language processing systems)?
  • Developing your own alignment algorithms, but have trouble debugging or visualizing the output?
  • You have a new alignment algorithm that you want to write about, but drawing the diagram just takes too long?

This script visualizes the alignment structure between two sentences into a pretty picture, so that we can clearly see what's being aligned to which.

Input

  1. two sentences (e.g. one in English -- "I like machine translation .", and one in Chinese -- "我 喜欢 机器翻译 。")
  2. the alignment between them (e.g. the first English word is aligned with the first Chinese word etc.)

Output

  1. A pretty PNG image visualizing the two sentences and the alignments between them

The alignment structure between English sentence I like machine translation . and Chinese sentence 我 喜欢 机器翻译 。

Installation

This script is written in Python2. Once you have Python2 and the following dependent libraries installed, the installation of this software is just one click away -- You may simply download the draw_alignment.py file, and you're good to go.

The two dependent libraries are listed below

  1. Google flags library -- python-gflags
  2. Python Image Library -- PIL

Usage

###Analyzing Alignments produced by GIZA++ GIZA++ is an awesome software that produces word level alignments for a list of sentences and their translations. We use it all the time in our researches, but its alignment output only contains a list of numbers, which is hard to interprete.

We typically feed two files to GIZA++, containing sentences in one language, and their translations in another language. For example

test/Eng.txt

the government should not limit the amount spent on the aged because this problem is becoming more and more prevalent in singapore .
an ageing population or what has been coined the " silver tsunami" is a phenomena faced by developed countries around the globe and singapore is no exception .
...

test/Chs.txt

, 因为 这个 问题 变 得 越来越 普遍 , 在 新加坡 , 政府 不 应该 限制 对 老年 花费 的 金额 。
“ 银发 海啸 ” 的 现象 所 面临 的 发达国家 在 全球 各地 和 新加坡 的 人口 老龄化 已经 创造 也 不 例外 。
...

GIZA++ is able to produce the alignments between sentences in the two files.

test/Align.txt

2-14 12-2 13-3 21-10 8-16 3-13 20-9 15-4 1-12 5-19 6-20 11-1 10-17 0-11 4-15 19-7 22-21 7-18 17-8 15-6 
21-12 17-9 7-2 9-3 20-10 13-4 9-0 12-4 11-1 0-15 5-18 2-16 1-17 26-22 24-15 23-14 22-13 16-6 6-18 16-8 15-7 14-5 27-23 25-20 
...

My program helps you visualize the alignments. The following command will generate the alignment of the second sentence into output.png.

python draw_alignment.py --src_sentences=test/Eng.txt --trg_sentences=test/Chs.txt --align_file=test/Align.txt --sentence_id=1 --output_image=output.png

In the above case, output.png will contain the following image The GIZA output can be visualized into a image.

###Using the Package as an External Library You want to debug your new alignment algorithm, but have trouble analyzing its output? You may call the DrawDirAlignToFile function to produce a pretty picture for your purpose.

Syntax: DrawDirAlignToFile(src_sentence, trg_sentence, alignments, output_imagefile)

The following code will generate the "I like machine translation" example.

import draw_alignment
draw_alignment.DrawDirAlignToFile("I like machine translation .", u"我 喜欢 机器翻译 。", [(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)], "test.png")

About

A script that visualizes alignments between two sentences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%