Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01b2773z574
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Funkhouser, Thomas | - |
dc.contributor.author | Zeng, Andy | - |
dc.contributor.other | Computer Science Department | - |
dc.date.accessioned | 2019-12-03T05:08:35Z | - |
dc.date.available | 2019-12-03T05:08:35Z | - |
dc.date.issued | 2019 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01b2773z574 | - |
dc.description.abstract | A human’s remarkable ability to manipulate unfamiliar objects with little prior knowledge of them is a constant inspiration for robotics research. Despite the interest of the research community, and despite its practical value, robust manipulation of novel objects in cluttered environments still remains a largely unsolved problem. Classic solutions (e.g. involving 6D object pose estimation) typically require prior knowledge of the objects (e.g. class categories or 3D CAD models), which may not be available outside of highly constrained settings. More recent deep learning methods using end-to-end convolutional networks (e.g. raw pixels to motor torques) have the potential to model complex skills that generalize, but they remain highly data inefficient -- and robot data (e.g. trial and error) is expensive. In this thesis, we consider an approach to learning manipulation called visual affordances. The idea is to use classic controllers to design motion primitives, then use convolutional networks to map from visual observations (e.g. images) to the perceived affordances (e.g. confidence scores or action-values) of the primitives for every pixel of the input. By leveraging dense equivariant state and action representations, this formulation can be used to acquire complex vision-based manipulation skills (e.g. pushing, grasping, throwing) on real robot platforms that generalize to novel objects, while using orders of magnitude less data. While visual affordances may not be directly compatible with classic planning frameworks that involve explicit forward simulation or propagation, in this thesis we show that it is possible to workaround this limitation by extending it with model-free reinforcement learning to sequence primitive picking motions for more complex manipulation policies. We also study how it can be combined with residual physics (learning to predict residual values on top of control parameter estimates from an initial analytical controller) to enable learning end-to-end visuomotor policies that leverage the benefits of analytical models while still maintaining the capacity (via data-driven residuals) to account for real-world dynamics that are not explicitly modeled. Finally, we conclude by discussing the limitations of learning visual affordances, which suggest directions for future work. | - |
dc.language.iso | en | - |
dc.publisher | Princeton, NJ : Princeton University | - |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a> | - |
dc.subject | Artificial Intelligence | - |
dc.subject | Computer Vision | - |
dc.subject | Deep Learning | - |
dc.subject | Machine Learning | - |
dc.subject | Robotics | - |
dc.subject | Visual Affordances | - |
dc.subject.classification | Robotics | - |
dc.subject.classification | Artificial intelligence | - |
dc.subject.classification | Computer science | - |
dc.title | Learning Visual Affordances for Robotic Manipulation | - |
dc.type | Academic dissertations (Ph.D.) | - |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Zeng_princeton_0181D_13206.pdf | 9.73 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.