Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010r9673747
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorBlei, David Men_US
dc.contributor.authorChang, Jonathanen_US
dc.contributor.otherElectrical Engineering Departmenten_US
dc.date.accessioned2011-11-18T14:38:57Z-
dc.date.available2011-11-18T14:38:57Z-
dc.date.issued2011en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp010r9673747-
dc.description.abstractNetwork data, such as citation networks of documents, hyperlinked networks of web pages, and social networks of friends, are pervasive in applied statistics and machine learning. The statistical analysis of network data can provide both useful predictive models and descriptive statistics. Predictive models can point social network members towards new friends, scientific papers towards relevant citations, and web pages towards other related pages. Descriptive statistics can uncover the hidden community structure underlying a network data set. In this work we develop new models of network data that account for both links and attributes. We also develop the inferential and predictive tools around these models to make them widely applicable to large, real-world data sets. One such model, the Relational Topic Model can predict links using only a new node's attributes. Thus, we can suggest citations of newly written papers, predict the likely hyperlinks of a web page in development, or suggest friendships in a social network based only on a new user's profile of interests. Moreover, given a new node and its links, the model provides a predictive distribution of node attributes. This mechanism can be used to predict keywords from citations or a user's interests from his or her social connections. While explicit network data --- network data in which the connections between people, places, genes, corporations, etc. are explicitly encoded --- are already ubiquitous, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. To resolve this we present a probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.en_US
dc.language.isoenen_US
dc.publisherPrinceton, NJ : Princeton Universityen_US
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>en_US
dc.subject.classificationEngineeringen_US
dc.subject.classificationComputer scienceen_US
dc.subject.classificationStatisticsen_US
dc.titleUncovering, Understanding, and Predicting Linksen_US
dc.typeAcademic dissertations (Ph.D.)en_US
pu.projectgrantnumber690-2143en_US
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Chang_princeton_0181D_10035.pdf4.92 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.