Yiannis Aloimonos, University of Maryland/USA: One-shot Visual Learning of Human-Environment Interactions
Human manipulation actions involve hands, tools and objects (opening the refrigerator, pouring milk into a bowl, cutting a piece of bread with a knife, closing a door). During such an action, there are four basic events: 1) A new object can become part of the activity 2) One object can transform into another object 3) Multiple objects can combine into one object and 4) One object can separate into multiple objects. If we are able to visually monitor those events, then we can create a tree, the activity tree, which amounts to a representation of the action that a robot can then use in order to replicate that action. In other words, if a robot possesses a visual sense, it can learn this way how to perform an activity from one example. This is achieved simply by learning the corresponding activity tree. This is what is discussed here. First, I explain what the activity tree is and how it can obtained from visual input (of a human performing an action) through the appropriate segmentation of the signal. It turns out that contacts play a major role. Then I show that the tree is sufficient for executing that action under consideration. Finally, I apply the theory to learning from one example how to interact with a number of household appliances, such as opening the refrigerator and fetching an object from inside, or opening the microwave in order to place an object inside.
Heni Ben Amor, Arizona State University/USA: Towards Learning Semantic Policies for Human-Robot Collaboration
Abstract: Collaborative robots that assist humans in their daily activities are a grand vision of AI and robotics. However, programming such interaction skills by hand is notoriously hard. At the same time, most existing representations for specifying joint human-robot behavior are shallow and do rarely incorporate semantic and structural aspects. In this talk, I will present recent advances in learning human-robot collaboration skills from human-human demonstrations. I will introduce the 'Interaction Primitives' concept and will show how it can be extended to incorporate semantic and contextual information. In addition, a novel approach to communicating robot intentions using augmented reality and mixed-reality cues will be discussed. The overall goal of this task is to identify how multi-modal and semantic information can be leveraged to extract semantic policies for bi-directional human-robot co-adaptation.
Short Biography: Heni Ben Amor is an Assistant Professor at Arizona State University where he leads the ASU Interactive Robotics Laboratory. Prior to that, he was a Research Scientist at the Institute for Robotics and Intelligent Machines at GeorgiaTech in Atlanta. Heni studied Computer Science at the University of Koblenz-Landau (GER) and earned a Ph.D in robotics from the Technical University Freiberg and the University of Osaka in 2010 where he worked with Hiroshi Ishiguro and Minoru Asada. Before moving to the US, Heni was a postdoctoral scholar at the Technical University Darmstadt. Heni's research topics focus on artificial intelligence, machine learning, human-robot interaction, robot vision, and automatic motor skill acquisition. He received the highly competitive Daimler-and-Benz Fellowship, as well as several best paper awards at major robotics and AI conferences. He is also in the program committee of various AI and robotics conferences such as AAAI, IJCAI, IROS, and ICRA.
Download slides of presentation
Tamim Asfour, Karlsruhe Institute of Technology/Germany: On Combining Human Demonstration and Natural Language for Semantic Action Representations
Abstract: Action understanding is an important cognitive capability required for endowing robots with compact and generalizable representations, which can be learned from human observations and previous experiences. In this talk, I first present our work on the semantic segmentation of human demonstration the extraction of planning operators from human demonstration, and the execution of learned actions at different temporal scales, e.g. in a longer time period, without altering the characteristic features of actions, such as the speed. Second, we introduce the KIT Language-Motion Dataset and recent results on linking human motion and natural language for the generation of semantic representations of human activities as well as for the generation of “robot” activities based on natural language input. Finally, I will show how action representations can be stored in a deep episodic memory which encodes action experiences into a latent vector space and, based on this latent encoding, categorizes actions, reconstructs original action frames, and also predicts future actions frames. I will present the experimental results verifying the robustness of the model on two different datasets and also showing how the such episodic memory facilitate robot action execution.
Short biography: Tamim Asfour is full Professor at the Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT) where he holds the chair of Humanoid Robotics Systems and is head of the High Performance Humanoid Technologies Lab (H2T). His current research interest is high performance 24/7 humanoid robotics. Specifically, his research focuses on engineering humanoid robot systems, which are able to perform grasping and dexterous manipulation tasks, learn from human observation and sensorimotor experience as well as on the mechano-informatics of humanoids as the synergetic integration of mechatronics, informatics and artificial intelligence methods into integrated complete humanoid robot systems. He is developer of the ARMAR humanoid robot family and is leader of the humanoid research group at KIT since 2003. Tamim Asfour received his diploma degree in Electrical Engineering in 1994 and his PhD in Computer Science in 2003 from the University of Karlsruhe (TH).
Michael Beetz, Universität Bremen/Germany: "Robotic Agents that Know What They are Doing"
Abstract: Explicit representations of robot knowledge are necessary to achieve competent robotic agents capable of performing variations of tasks. While state-of-the-art knowledge representations exist and are necessary, they are insufficient, as they are designed to abstract away from how actions are executed. We argue that representations that extend to subsymbolic motion and perception level are needed to fill in the gap. We propose a knowledge infrastructure that supports these representations and provides a web-based service for humans and robots to easily access and exchange the knowledge.
Short Biography: Michael Beetz is a professor for Computer Science at the Faculty for Mathematics & Informatics of the University Bremen and head of the Institute for Artificial Intelligence (IAI). IAI investigates AI-based control methods for robotic agents, with a focus on human-scale everyday manipulation tasks. With his openEASE, a web-based knowledge service providing robot and human activity data, Michael Beetz aims at improving interoperability in robotics and lowering the barriers for robot programming. Due to this the IAI group provides most of its results as open-source software, primarily within the ROS software ecosystem. He is the spokesperson of the DFG Collaborative Research Center EASE, which stands for Everyday Activity Science and Engineering and aims at understanding generative information processing models underlying the mastery of everyday manipulation tasks in complete, integrated systems.
Gordon Cheng, Technical University of Munich/Germany:"Semantic representations to enable robots to observe, segment and recognize human activities"
Abstract: Allowing robots to recognize activities through different sensors and reusing its previous experiences is a prominent way to program robots. For this, a recognition method needs to be proposed such that is transferable toward different domains independently of the used input sources. In this talk, we present a method that performs continuous segmentation of the motions of users’ hands and simultaneously classifies known actions while learning new ones on demand. Thus, a meaningful semantic description is obtained in terms of human motions and object properties. The observed data are analyzed in real-time by a reasoning system capable of detecting and learning human activities from different sensors in different domains. It first applies a continuous motion segmentation scheme to the human’s movements, then learns and classify the actions performed in each resulting segment via ontology-based reasoning on the user’s motion parameters and the objects involved. Finally, the classification results are used to generate a task-level representation of the possible actions which can be performed in the environment. This provides a high-level state transition graph which can be used for task planning on a physical robot.
Short Biography: Gordon Cheng is the professor and chair of Cognitive Systems, and founding director of the Institute for Cognitive Systems, Technische Universität München, Munich, Germany. He was the head of the Department of Humanoid Robotics and Computational Neuroscience, ATR Computational Neuroscience Laboratories, Kyoto, Japan, from 2002 to 2008. He was the group leader for the JST International Cooperative Research Project, Computational Brain, from 2004 to 2008. He was designated a project leader from 2007 to 2008 for the National Institute of Information and Communications Technology of Japan. He has held visiting professorships worldwide in multidisciplinary fields comprising mechatronics in France, neuroengineering in Brazil, and computer science in the USA. He held fellowships from the Center of Excellence and the Science and Technology Agency of Japan. Both of these fellowships were taken at the Humanoid Interaction Laboratory, Intelligent Systems Division at the Electrotechnical Laboratory, Japan. He received the PhD degree in systems engineering from the Department of Systems Engineering, The Australian National University, in 2001, and the bachelors and masters degrees in computer science from the University of Wollongong, Wollongong, Australia, in 1991 and 1993, respectively. He was the managing director of the company G.T.I. Computing in Australia. His current research interests include humanoid robotics, cognitive systems, brain machine interfaces, biomimetic of human vision, human-robot interaction, active vision, and mobile robot navigation. He is the co-inventor of approximately 15 patents and has co-authored approximately 250 technical publications, proceedings, editorials, and book chapters.
Gregory D. Hager, Mandell Bellmore Professor of Computer Science Johns Hopkins University: "Mentoring Robots: Showing, Telling, and Critiquing"
Abstract: Over the past decade, we’ve seen a tidal shift from a traditional view of robots as a means of automation toward robots as a means of enhancement – enhancement of productivity, enhancement of life quality, or enhancement of ability. These are all areas where, for the complementary human roles, teaching, training, and mentorship play an important role – in a small business we train factory floor workers; in a care facility a care-giver learns a patient’s deficits; and in an operating room, a surgeon trains residents and fellows. In this talk, I will explore the idea of training robots in a broad sense. I will illustrate some current ideas and trends with concrete examples from our own work developing collaborative robots for manufacturing, healthcare, and automated driving. In particular, I’ll describe our recent work on task and motion planning which uses learning-based methods to ground symbolic representations and learn policies that implement the corresponding actions. I’ll end with some thoughts on how recent data-driven approaches to learning might be integrated with structured models of tasks, and some of the research challenges this poses for the community.
Short Biograph: Gregory D. Hager is the Mandell Bellmore Professor of Computer Science at Johns Hopkins University and Founding Director of the Malone Center for Engineering in Healthcare. Professor Hager’s research interests include collaborative and vision-based robotics, time-series analysis of image data, and medical applications of image analysis and robotics. He has published over 300 articles and books in these areas. He is a fellow of the IEEE for his contributions to Vision-Based Robotics and a Fellow of the MICCAI Society for his contributions to imaging and his work on the analysis of surgical technical skill. In 2014, he was awarded a Hans Fischer Fellowship in the Institute of Advanced Study of the Technical University of Munich where he also holds an appointment in Computer Science. Professor Hager is the founding CEO of Clear Guide Medical, and a co-founder of Ready Robotics.
Tetsunari Inamura, National Institute of Informatics/Japan: "Cloud based VR platform for collecting and sharing semantic interaction behavior"
Abstract: Symbolic and semantic knowledge related to daily life environment is quite important for autonomous robots which support human activities. One of the practical approaches to manage the symbolic and semantic information of human activity is to define an ontology for a target domain and application. However, this approach has two high-cost difficulties: 1) Human have to perform many times in actual fields to transfer embodied knowledge to robots, 2) Object and tool information, which is used in the performance, should be prepared and defined manually. In this talk, I propose a cloud-based immersive virtual reality platform which enables virtual human-robot interaction to collect the semantic and embodied knowledge of human activities in a variety of situation, without spending a lot of time and manpower. Through a demonstration experiment at RoboCup competition, I will show a feasibility and potential of the system for developing intelligent human-robot interaction systems.
Short Biography: Tetsunari Inamura is an Associate Professor in the Principles of Informatics Research Division in the National Institute of Informatics, and an Associate Professor in the department of Informatics, School of Multidisciplinary Sciences, the Graduate University for Advanced Studies (SOKENDAI). He received BE in 1995 from the Department of Mechano-Informatics at the University of Tokyo, and MS and PhD in the Department of Informatics Engineering, School of Engineering, at the University of Tokyo in 1997 and 2000. His research interests include imitation learning and symbol emergence on humanoid robots, development of interactive humanoid robots, stochastic information processing, use of physics simulator for computational intelligence, and so on. He received a Funai academic prize in 2013, RoboCup Award from the Japanese Society for Artificial Intelligence in 2013, Young researcher encouraging award from the Robotics Society of Japan in 2008, and so on.
Manuela M. Veloso, Carnegie Mellon University/USA, Speaker on behalf of Manuela is Danny Zhu: "A Multi-layered Visualization Language for Video Augmentation"
Abstract: There are many tasks that humans perform that involve observing video streams, as well as tracking objects or quantities related to the events depicted in the video, that can be made more transparent by the addition of appropriate drawings to a video, e.g., tracking the behavior of autonomous robots or following the motion of players across a soccer field. We describe a specification of a general means of describing groups of time-varying discrete visualizations, as well as a demonstration of overlaying those visualizations onto videos in an augmented reality manner so as to situate them in a real-world context, when such a context is available and meaningful. Creating such videos can be especially useful in the case of autonomous agents operating in the real world; we demonstrate our visualization procedures on two example robotic domains. We take the complex algorithms controlling the robots’ actions in the real world and create videos that are much more informative than the original plain videos.
Short Biography: Manuela M. Veloso is the Herbert A. Simon University Professor in the School of Computer Science at Carnegie Mellon University. She is the Head of the Machine Learning Department, with joint appointments in the Computer Science Department, in the Robotics Institute, and in the Electrical and Computer Engineering Department. She researches in Artificial Intelligence with focus in robotics, machine learning, and multiagent systems. She founded and directs the CORAL research laboratory, for the study of autonomous agents that Collaborate, Observe, Reason, Act, and Learn, www.cs.cmu.edu/~coral. Professor Veloso is ACM Fellow, IEEE Fellow, AAAS Fellow, AAAI Fellow, Einstein Chair Professor, the co-founder and past President of RoboCup, and past President of AAAI. Professor Veloso and her students research with a variety of autonomous robots, including mobile service robots and soccer robots. See www.cs.cmu.edu/~mmv for further information, including publications.
Danny Zhu received his A.B. degree from Harvard University in 2011 and is currently a Ph.D. student in the Computer Science Department at Carnegie Mellon University. His research covers augmented reality visualizations for autonomous agents, as well as various topics relating to the RoboCup Small Size League.