Studentische Arbeiten

  • Hier bei uns am Lehrstuhl können fast alle Arbeiten auch remote durchgeführt werden, und die Bearbeitung der Arbeit ist in den allermeisten Fällen nicht eingeschränkt.

Am Lehrstuhl MMK sind ständig Themen für studentische Arbeiten (Bachelor- und Masterarbeiten, Forschungspraxis, IDP) zu vergeben.

Wenn Sie ein passendes Thema für Ihre studentische Arbeit gefunden haben wenden Sie sich an den zuständigen Assistenten. Falls keine passende Arbeit ausgeschrieben ist, können Sie auch mit einem Assistenten Kontakt aufnehmen, um ein Thema zu erhalten.

Ingenieurpraxis: Das Ziel der Ingenieurpraxis ist einen Einblick in die Abläufe in der Industrie zu erhalten. Daher bieten wir keine Ingenieurspraxis bei uns an, betreuen Sie aber gerne, wenn Sie eine Stelle in einer Firma finden.

Ebenso bieten wir keine Praktikumsstellen am Lehrstuhl an! Eingehende Anfragen werden aufgrund der Menge nicht beantwortet.

Themen für studentische Arbeiten

Sachgebiet: Virtual Reality

Forschungspraxis: Forschungsumfrage & 3D-Modellierung

Thema Forschungsumfrage & 3D-Modellierung
Typ Forschungspraxis (FP)
Betreuer Maximilian Rettinger, M.Sc.
E-Mail: maximilian.rettinger@tum.de
Sachgebiet Virtual Reality
Beschreibung Am Lehrstuhl für Mensch-Maschine-Kommunikation haben Sie ab sofort die Möglichkeit, sich für eine Forschungspraxis (FP) zu bewerben. Dabei erforschen Sie den aktuellen Lernerfolg medizinischer Schulungen mithilfe einer Onlineumfrage und erstellen ein 3D-Modell für eine VR-Applikation.
Aufgaben
  • Erstellung sowie Auswertung einer Onlineumfrage
  • Nachmodellierung eines Gerätes z. B. mittels Blender oder 3ds Max
Voraussetzung
  • Präzises sowie ordentliches Arbeiten
  • Kenntnisse im Bereich 3D-Modellierung
  • Motiviertes sowie zuverlässiges Arbeiten
Bewerbung Wenn Sie Interesse haben, senden Sie bitte eine E-Mail an die oben angegebene Adresse. Diese sollte Folgendes enthalten: Motivation, bisherige Erfahrung, Starttermin, Lebenslauf und Zeugnis bzw. Transcript of Records.

Empirische Forschung im Bereich Virtual Reality

Thema Empirische Forschung im Bereich Virtual Reality
Typ Forschungspraxis (FP), Interdisziplinäres Projekt (IDP)
Betreuer Maximilian Rettinger, M.Sc.
E-Mail: maximilian.rettinger@tum.de
Sachgebiet Virtual Reality
Beschreibung Am Lehrstuhl für Mensch-Maschine-Kommunikation haben Sie ab sofort die Möglichkeit, sich für ein Interdisziplinäres Projekt (IDP) oder eine Forschungspraxis (FP) im Bereich Virtual Reality zu bewerben. Hierfür gibt es verschiedene Themen, welche Sie mit dem Betreuer besprechen können.
Zudem besteht die Möglichkeit, bestimmte Themen in Teamarbeit mit Ihren Kommilitonen zu erforschen.
Aufgaben
  • Themenbezogene Literaturrecherche
  • Implementierung eines Szenarios
  • Planung, Durchführung und Evaluierung einer Nutzerstudie
Voraussetzung
  • Interesse an neuen Technologien und empirischer Forschung
  • Strukturiertes sowie zuverlässiges Arbeiten
  • Grundkenntnisse der objektorientierten Programmierung
Bewerbung Wenn Sie Interesse an einem Thema aus diesem Bereich haben, senden Sie bitte eine E-Mail an die oben angegebene Adresse. Diese sollte Folgendes enthalten: Motivation, bisherige Erfahrung, Starttermin, Lebenslauf und Zeugnis bzw. Transcript of Records.

Sachgebiet: Speech Recognition

Joint training of speech enhancement and speech recognition

Topic Joint training of speech enhancement and speech recognition
Typ Research Internship (FP), Master's Thesis (MA)
Betreuer Lujun Li, M.Eng.
E-Mail: lujun.li@tum.de
Beschreibung Motivation:
Recently, end-to-end neural networks have made significant breakthroughs in the field of speech recognition, challenging the dominance of DNN-HMM hybrid architectures. However, speech inputs for ASR systems are generally interfered by various background noises and reverberations in realistic environments, leading to the dramatic performance degradation. To alleviate this issue, the mainstream approach is to use a well-designed speech enhancement module as the front-end of ASR. However, enhancement modules would result in speech distortions and mismatches to training, which sometimes degrades the ASR performance. Therefore, integrating the speech enhancement and end-to-end recognition network via jointly training is a promising research field.
Task:
The main task is to improve an already working joint training pipeline, which can be seen below, with state-of-the-art feature extraction methods, speech enhancement algorithms and speech recognition algorithms. Details of the architecture can be found in [1]. As a future reading, [2] also provides a detailed explanation of the integration of speech enhancement and speech recognition.

References:
  1. Liu, Bin & Nie, Shuai & Liang, Shan & Liu, Wen-Ju & Yu, Meng & Chen, Lianwu & Peng, Shouye & Li, Changliang. (2019). Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495. 10.21437/Interspeech.2019-1242.
  2. M. Ravanelli, P. Brakel, M. Omologo and Y. Bengio, "Batch-normalized joint training for DNN-based distant speech recognition," 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, 2016, pp. 28-34, doi: 10.1109/SLT.2016.7846241.

Requirements
  • Excellent coding skills, preferably in Python.
  • Experience in deep learning frameworks, preferably in Torch/PyTorch & Tensorflow.
  • Background knowledge in speech signal processing or natural language processing is a bonus.
  • Motivation to work on deep learning.
Application If you are interested in working in the promising field of artificial intelligence and more specifically, speech signal processing, we welcome the applications via the email address above. Please specify the topic in the email subject, e.g. Masterarbeit/Forschungspraxis application for topic ‘XYZ’, while emphasizing your previous project experience and ideal starting date. Please also attach your recent CV and the transcript.

A New Method to Generate Hidden Markov Model Topology

Thema A New Method to Generate Hidden Markov Model Topology
Typ Masterarbeit, Forschungspraxis
Betreuer Lujun Li, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: lujun.li@tum.de
Beschreibung Motivation : For decades, acoustic models in speech recognition systems pivot on Hidden Markov Models (HMMs), e.g., Gaussian Mixture Model-HMM system and Deep Neural Network-HMM system, and achieve a series of impressive performance. At present, the most widely-employed HMM topology is 3-state left-to-right architecture. However, there is no adamant evidence for its suitability and superiority, reflecting the deficiency of the research into HMM topology. We propose an innovative technique to customize an individual HMM topology for each phoneme, and achieve great results in monophone system. The topic of this thesis is to apply it in triphone system.
Task : The main task is to transfer an already working deep architecture from monophone system to triphone system. Overall, the following steps will be followed during the thesis:
1. State-of-the-art research
2. Understanding the already working deep architecture
3. Implementing the algorithm in triphone system
4. Evaluation of the architecture on Tedliumv2 corpus
5. Demonstration of the working system
Voraussetzung 1. Background knowledge in speech signal processing or natural language processing.
2. Excellent coding skills, preferably in C++ & Python.
3. Experience in deep learning frameworks, preferably in Torch/PyTorch & Tensorflow.
4. Experience in Kaldi toolkits is a big bonus.
5. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “ application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Computer Vision

Statistical Analysis of Deep Learning Object Detectors

Thema Statistical Analysis of Deep Learning Object Detectors
Typ Forschungspraxis
Betreuer Johannes Gilg, M.Sc.
E-Mail: johannes.gilg@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: Deep Learning has revolutionized the Computer Vision task of Object Detection. This has led to an explosion in the number of Object Detection architectures (e.g. SSD, R-CNN, YOLO, FPN[1], CenterNet[2] and DETR[3]) and training "tricks". Their merit is usually judged on the single metric of mean average precision (mAP), on a popular dataset (COCO)[4], against the model size in number of parameters and compute cost in FLOPs. This raises the question whether this reduction to a single metric hides some model specific performance differences and biases.
Task: The task is to gather or generate the outputs of different publicly available object detectors and compute distinct expressive metrics on their predictions. These metrics should then be rigorously analyzed to gain insights on model/architectures biases and other notable distinctions beyond the mAP
Opportunity: Get hands-on experience with different Deep Learning frameworks. Go on a deep dive into current state-of-the-art deep learning Object Detector architectures, methods and tricks.
References:
[1] Zhao, Zhong-Qiu, et al. "Object detection with deep learning: A review." IEEE transactions on neural networks and learning systems . 2019.
[2] Xingyi, Wang, and Krähenbühl "Objects as points." arXiv preprint arXiv:1904.07850. 2019.
[3] Carion, Nicolas, et al. "End-to-End Object Detection with Transformers." arXiv preprint arXiv:2005.12872. 2020.
[4] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European conference on computer vision. 2014.
Voraussetzung
  • Experience in Computer Vision & Deep Learning
  • Good programming skills, ideally in Python
  • Solid understanding of statistics
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Sonstiges

Multi-modal Multi-target Learning

Topic Multi-modal Multi-target Learning
Type Master's Thesis
Advisor Yue Zhang-Weninger, Dr. M.Sc. Ph.D.
E-Mail: y.zhang@tum.de
Area Multi-modal
Description Motivation:
A common problem in machine learning is to deal with multimodal datasets with disjoint label spaces, missing labels and/or missing inputs. In our previous work [1], we introduced the openXDATA tool that completes the missing labels in partially labelled or unlabelled datasets in order to generate multi-target data with labels in the joint label space of the datasets. To this end, we designed and implemented the cross-data label completion (CDLC) algorithm that uses a multi-task shared-hidden-layer DNN to iteratively complete the sparse label matrix of the instances from the different datasets.

Task:
In this project, the openXDATA tool should first be extended to support end-to-end learning, using raw input signals instead of precomputed features. Second, support for multimodal input (e.g. audio, video and physiology) should be added. Evaluation can be done e.g. on the database introduced in [2] or health applications such as depression recognition.

References:
[1] F. Weninger, Y. Zhang, and R. Picard, “openXDATA: A tool for multi-target data generation and missing label completion,” arXiv:2007.13889, 2020
[2] H. Chen, Y. Zhang, F. Weninger, R. Picard, C. Breazeal, and H. W. Park, “Dyadic speech affect recognition using DAMI-P2C parent-child multimodal interaction dataset,” in Proc. of ICMI, pp. 97–106, 2020
Requirements
  • Excellent programming skills in Python
  • Background in machine learning
  • Experience with machine learning toolkits: TensorFlow, PyTorch, etc.
Application If you are interested in this topic, please send your application to the email address above. A short statement of motivation and previous experience in related fields should be included. Please also attach your CV and transcript.

Multi-modal Smile Detection

Topic Multi-modal smile detection
Type Master's Thesis
Advisor Yue Zhang-Weninger, Dr. M.Sc. Ph.D.
E-Mail: y.zhang@tum.de
Area Multi-modal
Description Smile detection from facial expressions has been widely studied in HCI ([1], [2]). However, few studies so far have taken into account acoustic cues for recognizing smiles.
In this project, multimodal smile detection should be investigated in order to improve the recognition performance and robustness. The evaluation will be done on a corpus of public speeches. The data will need to be preprocessed and annotated semi-automatically using a smile detector based on facial features. Then, a multimodal smile detector will be implemented, combining visual and auditory features. Finally, we will compare the feature-based recognition to an end-to-end approach (i.e. starting from raw speech signals).
References [1] K. El Haddad, H. Cakmak, S. Dupont, and T. Dutoit, “Laughter and smile processing for human-computer interactions,” Proc. of International Conference on Language Resources and Evaluation, Workshop on Just Talking – Casual Talk Among Humans and Machines, pp. 23–28, 2016.
[2] S. Petridis, B. Martinez, and M. Pantic, “The MAHNOB laughter database,” Image and Vision Computing, vol. 31, no. 2, pp. 186–202, 2013.
Requirements
  • Excellent programming skills in Python
  • Background in machine learning
  • Experience with machine learning toolkits: TensorFlow, PyTorch, etc.
Application If you are interested in this topic, please send your application to the email address above. A short statement of motivation and previous experience in related fields should be included. Please also attach your CV and transcript.