Segmentation for Efficient Supervised Language Annotation with an Explicit Cost-Utility Tradeoff

Matthias Sperber; Mirjam Simantzik; Graham Neubig; Satoshi Nakamura; Alex Waibel

Vol. 2 (2014)

TACL approved

Segmentation for Efficient Supervised Language Annotation with an Explicit Cost-Utility Tradeoff

Published 2014-04-30

Matthias Sperber
Mirjam Simantzik
Graham Neubig
Satoshi Nakamura
Alex Waibel

Matthias Sperber
Karlsruhe Institut of Technology

Mirjam Simantzik
Mobile Technologies GmbH

Graham Neubig
Nara Institute of Science and Technology

Satoshi Nakamura
Nara Institute of Science and Technology

Alex Waibel
Karlsruhe Institute of Technology

Abstract

In this paper, we study the problem of manually correcting automatic annotations of natural language in as efficient a manner as possible. We introduce a method for automatically segmenting a corpus into chunks such that many uncertain labels are grouped into the same chunk, while human supervision can be omitted altogether for other segments. A tradeoff must be found for segment sizes. Choosing short segments allows us to reduce the number of highly confident labels that are supervised by the annotator, which is useful because these labels are often already correct and supervising correct labels is a waste of effort. In contrast, long segments reduce the cognitive effort due to context switches. Our method helps find the segmentation that optimizes supervision efficiency by defining user models to predict the cost and utility of supervising each segment and solving a constrained optimization problem balancing these contradictory objectives. A user study demonstrates noticeable gains over pre-segmented, confidence-ordered baselines on two natural language processing tasks: speech transcription and word segmentation.

PDF (Presented at ACL 2014)

Author Biography

Matthias Sperber

Research assistant at the Institute for Anthropomatics

Graham Neubig

Assistant professor at the Augmented Human Communication Laboratory

Satoshi Nakamura

Professor at the Augmented Human Communication Laboratory

Alex Waibel

Professor at the Institute for Anthropomatics