Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

Rico Sennrich

Abstract


The role of language models in SMT is to promote fluent translation
output, but traditional n-gram language models are unable to capture
fluency phenomena between distant words, such as some morphological
agreement phenomena, subcategorisation, and syntactic collocations with
string-level gaps. Syntactic language models have the potential to fill
this modelling gap. We propose a language model for dependency
structures that is relational rather than configurational and thus
particularly suited for languages with a (relatively) free word order.
It is trainable with Neural Networks, and not only improves over
standard n-gram language models, but also outperforms related syntactic
language models. We empirically demonstrate its effectiveness in terms
of perplexity and as a feature function in string-to-tree SMT from
English to German and Russian. We also show that using a syntactic
evaluation metric to tune the log-linear parameters of an SMT system
further increases translation quality when coupled with a syntactic
language model.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 Association for Computational Linguistics

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.