Uncovering, Understanding, and Predicting Links

Chang, Jonathan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010r9673747

Title:	Uncovering, Understanding, and Predicting Links
Authors:	Chang, Jonathan
Advisors:	Blei, David M
Contributors:	Electrical Engineering Department
Subjects:	Engineering Computer science Statistics
Issue Date:	2011
Publisher:	Princeton, NJ : Princeton University
Abstract:	Network data, such as citation networks of documents, hyperlinked networks of web pages, and social networks of friends, are pervasive in applied statistics and machine learning. The statistical analysis of network data can provide both useful predictive models and descriptive statistics. Predictive models can point social network members towards new friends, scientific papers towards relevant citations, and web pages towards other related pages. Descriptive statistics can uncover the hidden community structure underlying a network data set. In this work we develop new models of network data that account for both links and attributes. We also develop the inferential and predictive tools around these models to make them widely applicable to large, real-world data sets. One such model, the Relational Topic Model can predict links using only a new node's attributes. Thus, we can suggest citations of newly written papers, predict the likely hyperlinks of a web page in development, or suggest friendships in a social network based only on a new user's profile of interests. Moreover, given a new node and its links, the model provides a predictive distribution of node attributes. This mechanism can be used to predict keywords from citations or a user's interests from his or her social connections. While explicit network data --- network data in which the connections between people, places, genes, corporations, etc. are explicitly encoded --- are already ubiquitous, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. To resolve this we present a probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.
URI:	http://arks.princeton.edu/ark:/88435/dsp010r9673747
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Chang_princeton_0181D_10035.pdf		4.92 MB	Adobe PDF	View/Download

Show full item record

Search

Browse