Project Lab Human Activity Understanding

Module Number: EIneu

Duration: 1 Semester

Semester Occurence: Summer semester

Language: English

Number of ECTS: 6

Staff

Professor in charge: Eckehard Steinbach

Amount or work

Contact hours: 75

Self-study hours: 105

Total: 180

Description of Achievement and Assessment Methods

The module grade consists of the following components:
- [20%] Implementation of introductory practical tasks in the field of Human Activity Understanding in Python, C++ - data acquisition and processing, recognition of people and objects in the scene, obtaining semantic understanding of ongoing activity (2 Data-acquistion campagains, 8 programming tasks). Thus the students demonstrate that they have gained deep knowledge of the lab equipment, and of sensor data acquisition methods. Furthermore, they are able to use AI models, algorithms, and program-code to gain an automatic understanding of human activity.

- [60%] Hands-on project work - creating project plans and presenting them (8-10 Min. Presentation), regularly discussing work progress and next steps with supervisor (2 meetings), technical problem solving, and using appropriate tools for efficient teamwork (4 lab sessions). In this way, students should demonstrate that they can systematically and in a structured way define a project topic, break it down into individual work packages, evaluate alternative ways of technically solving the problem, and work together in a team in a goal-oriented manner using appropriate digital tools (gitlab, wiki, etc.).

- [20%] Ca. 10-minute presentation of results including demo, followed by a ca. 10-minute discussion. The learning objectives are to practice presentation techniques in a project-context, to summarize the team work in an interesting and understandable way, and to conclude with an interactive demo

Prerequisites (recommended)

- Scientific method (research, analysis, documentation and presentation techniques).
- Basics of computer science, Python and C++
- Working knowledge of Linux systems

Intended Learning Outcomes

Upon successful completion of this module, students are able to understand the challenges in Human Activity Understanding and design processes for automatic sensor-based recognition of ongoing human activity.

Students are able to collect and utilize synthetic data as well as multi-camera sequential data in ego-perspective and stationary setups, annotating and extracting relevant semantic information, and learning about representation for spatial and temporal data.

Students are able to learn how to use AI models and algorithms to extract information available from a scene, and recognize and predict human activity based on the extracted information.

They are eventually able to analyze and evaluate the results of the various algorithms involved as well as the solutions they have designed.

Content

Sensor data collection and annotation
- Multi-sensor and multi-view data collection and processing, including color/depth/IMU
- Synthetic data generation for Human Actions
- Accelerated ground truth annotation using interactive instance segmentation and tracking

Semantic inference building blocks
- Object detection
- Human and Object pose estimation

Graph representation of spatial and temporal data
- 3D scene graphs
- semantic graphs
- Spatio-Temporal graphs
- Knowledge Bases (Ontologies)

Sequential deep learning models for Human Activity Recognition and Anticipation
- Recurrent Neural Networks
- Graph Networks
- Transformers

Teaching and Learning Methods

- Supervised weekly lab sessions with several introductory lectures by research assistants at the beginning of the course, and supervised practical implementation based on the provided skeleton codes.
- Individual methods and solutions introduced by the student
- Lectures on theoretical basics of project planning, technical management. and tools for collaboration (SCRUM, Gitlab, Wiki, etc.)
- Final project: individual and group work with independent planning, execution and documentation
- Seminar: Presentation of intermediate and final results and discussion (reflection, feedback).

Media

The following media forms will be used:
- Presentations
- Script and review articles from the technical literature
- Tutorials and software documentation
- Development Environment (virtual machines on server)
- Simulation environment
- Data collection setup

Reading List

Current thematically relevant publications from the scientific literature as well as tutorials.

Some examples of publications are:

- Chao, Yu-Wei, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S Narang, et al. “DexYCB: A Benchmark for Capturing Hand Grasping of Objects”.

- Puig, Xavier, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. “VirtualHome: Simulating Household Activities Via Programs.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, 8494–8502.

- Cheng, Ho Kei, Yu-Wing Tai, and Chi-Keung Tang. “Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion.” ArXiv:2103.07941 [Cs], March 21, 2021. arxiv.org/abs/2103.07941.



- Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. “YOLOv4: Optimal Speed and Accuracy of Object Detection.” ArXiv:2004.10934 [Cs, Eess], April 22, 2020.

- Xiu, Yuliang and Li, Jiefeng and Wang, Haoyu and Fang, Yinghong and Lu, Cewu, Pose Flow}: Efficient Online Pose Tracking, BMVC 2018.

- Jain, Ashesh, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena. “Structural-RNN: Deep Learning on Spatio-Temporal Graphs.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, 5308–17.