Multi-Source Text Generation and Beyond using Reinforcement Learning

Cho, Woon Sang

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk46h60w95

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Wang, Mengdi
dc.contributor.author	Cho, Woon Sang
dc.contributor.other	Operations Research and Financial Engineering Department
dc.date.accessioned	2021-06-10T17:14:17Z	-
dc.date.available	2021-06-10T17:14:17Z	-
dc.date.issued	2021
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk46h60w95	-
dc.description.abstract	Generating texts that resemble human-written natural texts has long been a research challenge. Given some initial text as a context to generate what should come next, many of the current text generation systems generate a continuation text that often times exhibits a loose connection to the preceding text, resulting in a lack of local connectivity between adjacent sentences, let alone coherence as a whole. Few attempted to explicitly improve text generation systems from the perspectives of coherence and cohesion. Therefore, a mechanism to reinforce the soundness and seamless connection of the combined text, that is, the initial human-written context and the system-generated text put together, is desirable. In this thesis, we propose two neural discriminators that provide coherence and cohesion reward signals to a neural language model. Next, we address another interesting challenge motivated from the following observation: ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined or clarification queries for the user whom initially entered the ambiguous query, such that each of the multiple refined queries relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. To this end, we propose a new task of generating a common question from multiple documents and present a simple variant of an existing multi-source encoder-decoder framework, Multi-Source Question Generator (MSQG). However, this simple class of models uses only the targeted (``positive'') multi-document set, and may generate generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given ``positive'' and ``negative'' sets of documents, we generate a question that is closely related to the ``positive'' set but is far away from the ``negative'' set. We also propose an effective auxiliary objective, Set-induced Contrastive Regularization (SCR) to develop a Multi-Source Coordinated Question Generator (MSCQG).
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>
dc.subject.classification	Artificial intelligence
dc.title	Multi-Source Text Generation and Beyond using Reinforcement Learning
dc.type	Academic dissertations (Ph.D.)
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Size	Format
Cho_princeton_0181D_13625.pdf	6.04 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse