Towards First-Person Context Awareness:
Discovery of User Routine from Egocentric Video using
Topic Models and Structure from Motion

Jabri, Allan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013484zk24k

Title:	Towards First-Person Context Awareness: Discovery of User Routine from Egocentric Video using Topic Models and Structure from Motion
Authors:	Jabri, Allan
Advisors:	Xiao, Jianxiong
Department:	Computer Science
Class Year:	2015
Abstract:	One of the ultimate goals of our pursuit of AI is to create intelligent machines that help us live our lives. In order to help us, these agents must gather a sense of our context. Already, personal computing technologies like Google Now use ego-centric (first-person) data - email, calendar, and other personal routine information - as actionable context. Recently, wearables have brought us the opportunity to easily capture many types of ego-centric data - including visual data. It is easy to imagine the potential impact of a context-aware intelligent assistant - ’aware’ of not only textual data but immediate visual information - for applications from assisted daily living to annotated augmented reality and self-organized life-logs. We imagine a future world when wearable computing is ubiquitous, and as a result, lifelogs and similar visual data are abundant. The problem of understanding user routine from ’big’ egocentric data naturally extends itself as an important machine learning problem. Our key observation is that egocentric data is ’overfit’ to person wearing it. Because human behavior tends to be periodic, hence the notion of ’routine’, lifelog data must then be a series of manifestations of periodic scenes. Using techniques inspired by work in scene understanding, ubiquitous computing, and 3D scene modeling, we propose two complementary approaches for discovering routine structure in ego-centric image data. We take a scene understanding approach, interpreting routine as periodic visits in meaningful scenes. For a robust representation of routine visual scenes, we propose a formulation of routine visual context as probablistic combinations of scene features discovered from a visual lifelog corpus using topic modeling. Concurrently, we discover the 3D spatial structure of routine scenes by incrementally building SFM models from images of the same spatial context. For proof of concept, we implement our framework using the Google Glass and an infrastructure that we call SUNglass.
Extent:	62 pages
URI:	http://arks.princeton.edu/ark:/88435/dsp013484zk24k
Type of Material:	Princeton University Senior Theses
Language:	en_US
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Size	Format
PUTheses2015-Jabri_Allan.pdf	27.78 MB	Adobe PDF	Request a copy

Show full item record

Search

Browse