Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

Minh-Thang Luong; Michael C. Frank; Mark Johnson

Vol. 1 (2013)

TACL approved

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

Published 2013-07-31

Minh-Thang Luong
Michael C. Frank
Mark Johnson

Minh-Thang Luong
Stanford University

Michael C. Frank
Stanford University

Mark Johnson
Macquarie University

Abstract

Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years. In most work on this topic, however, utterances in a conversation are treated independently and discourse structure information is largely ignored. In the context of language acquisition, this independence assumption discards cues that are important to the learner, e.g., the fact that consecutive utterances are likely to share the same referent (Frank et al., 2013). The current paper describes an approach to the problem of simultaneously modeling grounded language at the sentence and discourse levels. We combine ideas from parsing and grammar induction to produce a parser that can handle long input strings with thousands of tokens, creating parse trees that represent full discourses. By casting grounded language learning as a grammatical inference task, we use our parser to extend the work of Johnson et al. (2012), investigating the importance of discourse continuity in children’s language acquisition and its interaction with social cues. Our model boosts performance in a language acquisition task and yields good discourse segmentations compared with human annotators.

PDF (Presented at EMNLP 2013)