FEATURE-DRIVEN QUESTION ANSWERING WITH NATURAL LANGUAGE ALIGNMENT

Yao, Xuchen

FEATURE-DRIVEN QUESTION ANSWERING WITH NATURAL LANGUAGE ALIGNMENT

Date

2014-07-21

Authors

Yao, Xuchen

Publisher

Johns Hopkins University

Abstract

Question Answering (QA) is the task of automatically generating answers to natural language questions from humans, serving as one of the primary research areas in natural language human-computer interaction. This dissertation focuses on English fact-seeking (factoid) QA, for instance: when was Johns Hopkins founded? (January 22, 1876). The key challenge in QA is the generation and recognition of indicative signals for answer patterns. In this dissertation I propose the idea of feature-driven QA, a machine learning framework that automatically produces rich features from linguistic annotations of answer fragments and encodes them in compact log-linear models. These features are further enhanced by tightly coupling the question and answer snippets via monolingual alignment. In this work monolingual alignment helps question answering in two aspects: aligning semantically similar words in QA sentence pairs (with the ability to recognize paraphrases and entailment) and aligning natural language words with knowledge base relations (via web-scale data mining). With the help of modern search engines, database and machine learning tools, the proposed method is able to efficiently search through billions of facts in the web space and optimize from millions of linguistic signals in the feature space. QA is often modeled as a pipeline of the form: question (input) -> information retrieval (“search”) -> answer extraction (from either text or knowledge base) -> answer (output). This dissertation demonstrates the feature-driven approach applied throughout the QA pipeline: the search front end with structured information retrieval, the answer extraction back end from both unstructured data source (free text) and structured data source (knowledge base). Error propagation in natural language processing (NLP) pipelines is contained and minimized. The final system achieves state-of-the-art performance in several NLP tasks, including answer sentence ranking and answer extraction on one QA dataset, monolingual alignment on two annotated datasets, and question answering from Freebase with web queries. This dissertation shows the capability of a feature-driven framework serving as the statistical backbone of modern question answering systems.

Keywords

question answering, monolingual alignment, natural language processing, artificial intelligence

URI

http://jhir.library.jhu.edu/handle/1774.2/38010

Collections

ETD -- Doctoral Dissertations

Full item page