TGen

A statistical natural language generator for spoken dialogue systems

TGen is a statistical natural language generator, with two different algorithms supported:

A statistical sentence planner based on A*-style search, with a candidate plan generator and a perceptron ranker
A sequence-to-sequence (seq2seq) recurrent neural network architecture based on the TensorFlow toolkit

Both algoritms can be trained from pairs of source meaning representations (dialogue acts) and target sentences. The newer seq2seq approach is preferrable: it yields higher performance in terms of both speed and quality.

Both algorithms support generating sentence plans (deep syntax trees), which are subsequently converted to text using the existing the surface realizer from Treex NLP toolkit. The seq2seq algorithm also supports direct string generation.

For more details on the algorithms, please refer to our papers:

For A*-search based generation, see our ACL 2015 paper.
For seq2seq generation, see our ACL 2016 paper.
For an improved version of the seq2seq generation that takes previous user utterance into account to generate a more contextually-appropriate response, see our SIGDIAL 2016 paper.

Notice

TGen is highly experimental and only tested on a few datasets. Use at your own risk.
To get the version used in our ACL 2015 paper (A*-search only), see this release.
To get the version used in our ACL 2016 and SIGDIAL 2016 papers (seq2seq approach for generating sentence plans or strings, optionally using previous context), see this release.

Dependencies

TGen is written in Python (version 2.7). For TGen to work properly, you need to have several modules installed.

"Standard" (can be installed easily, with pip):

enum34
numpy
rpyc
pudb
TensorFlow, only version 0.6 is supported for the time being

Manual installation:

From the manual modules, the first two ones can be avoided by just copying a few libraries; these will be integrated here in the future.

Additionally, some obsolete code depends on Theano, but the imports are optional and the code will be probably removed in the future.

Parallel training on the cluster is using SGE's qsub.

Citing TGen

If you use or refer to the seq2seq generation in TGen, please cite this paper:

Ondřej Dušek and Filip Jurčíček (2016): Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.

If you use or refer to the context-aware improved seq2seq genration, please cite this paper:

Ondřej Dušek and Filip Jurčíček (2016): A Context-aware Natural Language Generator for Dialogue Systems. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Los Angeles, CA, USA.

If you use or refer to the A*-search generation in TGen, please cite this paper:

Ondřej Dušek and Filip Jurčíček (2015): Training a Natural Language Generator From Unaligned Data. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 451–461, Beijing, China.

License

Author: Ondřej Dušek

Licensed under the Apache License, Version 2.0.

Acknowledgements

Work on this project was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221 and core research funding, SVV projects 260 104 and 260 333, and GAUK grant 2058214 of Charles University in Prague. It used language resources stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (projects LM201001 and LM2015071).

Name		Name	Last commit message	Last commit date
Latest commit History 675 Commits
alex-context		alex-context
bagel-data		bagel-data
cs-restaurant		cs-restaurant
sfx-restaurant		sfx-restaurant
tgen		tgen
util		util
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
run_tgen.py		run_tgen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alex-context

alex-context

bagel-data

bagel-data

cs-restaurant

cs-restaurant

sfx-restaurant

sfx-restaurant

tgen

tgen

util

util

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

run_tgen.py

run_tgen.py

Repository files navigation

TGen

Notice

Dependencies

Citing TGen

License

Acknowledgements

About

Releases

Packages

Languages

License

qjay612/tgen

Folders and files

Latest commit

History

Repository files navigation

TGen

Notice

Dependencies

Citing TGen

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages