Multi-Source Text Generation and Beyond using Reinforcement Learning

Cho, Woon Sang

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk46h60w95

Title:	Multi-Source Text Generation and Beyond using Reinforcement Learning
Authors:	Cho, Woon Sang
Advisors:	Wang, Mengdi
Contributors:	Operations Research and Financial Engineering Department
Subjects:	Artificial intelligence
Issue Date:	2021
Publisher:	Princeton, NJ : Princeton University
Abstract:	Generating texts that resemble human-written natural texts has long been a research challenge. Given some initial text as a context to generate what should come next, many of the current text generation systems generate a continuation text that often times exhibits a loose connection to the preceding text, resulting in a lack of local connectivity between adjacent sentences, let alone coherence as a whole. Few attempted to explicitly improve text generation systems from the perspectives of coherence and cohesion. Therefore, a mechanism to reinforce the soundness and seamless connection of the combined text, that is, the initial human-written context and the system-generated text put together, is desirable. In this thesis, we propose two neural discriminators that provide coherence and cohesion reward signals to a neural language model. Next, we address another interesting challenge motivated from the following observation: ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined or clarification queries for the user whom initially entered the ambiguous query, such that each of the multiple refined queries relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. To this end, we propose a new task of generating a common question from multiple documents and present a simple variant of an existing multi-source encoder-decoder framework, Multi-Source Question Generator (MSQG). However, this simple class of models uses only the targeted (``positive'') multi-document set, and may generate generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given ``positive'' and ``negative'' sets of documents, we generate a question that is closely related to the ``positive'' set but is far away from the ``negative'' set. We also propose an effective auxiliary objective, Set-induced Contrastive Regularization (SCR) to develop a Multi-Source Coordinated Question Generator (MSCQG).
URI:	http://arks.princeton.edu/ark:/99999/fk46h60w95
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Size	Format
Cho_princeton_0181D_13625.pdf	6.04 MB	Adobe PDF	View/Download

Show full item record

Search

Browse