Skip to main navigation menu Skip to main content Skip to site footer

A Bayesian Model of Grounded Color Semantics

Abstract

Natural language meanings allow speakers to encode important real-world distinctions, but corpora of grounded language use also reveal that speakers categorize the world in different ways and describe situations with different terminology.  To learn meanings from data, we therefore need to link underlying representations of meaning to models of speaker judgment and speaker choice.  This paper describes a new approach to this problem: we model variability through uncertainty in categorization boundaries and distributions over preferred vocabulary.  We apply the approach to a large data set of color descriptions, where statistical evaluation documents its accuracy.  The results are available as a Lexicon of Uncertain Color Standards (LUX), which supports future efforts in grounded language understanding and generation by probabilistically mapping 829 English color descriptions to potentially context-sensitive regions in HSV color space.

PDF (presented at NAACL 2015)

Author Biography

Brian McMahan

Department of Computer Science

Matthew Stone

Department of Computer Science