Uncovering, Understanding, and Predicting Links

Chang, Jonathan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010r9673747

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Blei, David M	en_US
dc.contributor.author	Chang, Jonathan	en_US
dc.contributor.other	Electrical Engineering Department	en_US
dc.date.accessioned	2011-11-18T14:38:57Z	-
dc.date.available	2011-11-18T14:38:57Z	-
dc.date.issued	2011	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp010r9673747	-
dc.description.abstract	Network data, such as citation networks of documents, hyperlinked networks of web pages, and social networks of friends, are pervasive in applied statistics and machine learning. The statistical analysis of network data can provide both useful predictive models and descriptive statistics. Predictive models can point social network members towards new friends, scientific papers towards relevant citations, and web pages towards other related pages. Descriptive statistics can uncover the hidden community structure underlying a network data set. In this work we develop new models of network data that account for both links and attributes. We also develop the inferential and predictive tools around these models to make them widely applicable to large, real-world data sets. One such model, the Relational Topic Model can predict links using only a new node's attributes. Thus, we can suggest citations of newly written papers, predict the likely hyperlinks of a web page in development, or suggest friendships in a social network based only on a new user's profile of interests. Moreover, given a new node and its links, the model provides a predictive distribution of node attributes. This mechanism can be used to predict keywords from citations or a user's interests from his or her social connections. While explicit network data --- network data in which the connections between people, places, genes, corporations, etc. are explicitly encoded --- are already ubiquitous, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. To resolve this we present a probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject.classification	Engineering	en_US
dc.subject.classification	Computer science	en_US
dc.subject.classification	Statistics	en_US
dc.title	Uncovering, Understanding, and Predicting Links	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Chang_princeton_0181D_10035.pdf		4.92 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse