Studentische Arbeiten

  • Hier bei uns am Lehrstuhl können fast alle Arbeiten auch remote durchgeführt werden, und die Bearbeitung der Arbeit ist in den allermeisten Fällen nicht eingeschränkt.

Am Lehrstuhl MMK sind ständig Themen für studentische Arbeiten (Bachelor- und Masterarbeiten, Forschungspraxis, IDP) zu vergeben.

Wenn Sie ein passendes Thema für Ihre studentische Arbeit gefunden haben wenden Sie sich an den zuständigen Assistenten. Falls keine passende Arbeit ausgeschrieben ist, können Sie auch mit einem Assistenten Kontakt aufnehmen, um ein Thema zu erhalten.

Ingenieurpraxis: Das Ziel der Ingenieurpraxis ist einen Einblick in die Abläufe in der Industrie zu erhalten. Daher bieten wir keine Ingenieurspraxis bei uns an, betreuen Sie aber gerne, wenn Sie eine Stelle in einer Firma finden.

Ebenso bieten wir keine Praktikumsstellen am Lehrstuhl an! Eingehende Anfragen werden aufgrund der Menge nicht beantwortet.

Themen für studentische Arbeiten

Sachgebiet: Virtual Reality

Empirische Forschung im Bereich Virtual Reality

Thema Empirische Forschung im Bereich Virtual Reality
Typ Forschungspraxis (FP), Interdisziplinäres Projekt (IDP)
Betreuer Maximilian Rettinger, M.Sc.
E-Mail: maximilian.rettinger@tum.de
Sachgebiet Virtual Reality
Beschreibung Am Lehrstuhl für Mensch-Maschine-Kommunikation haben Sie ab sofort die Möglichkeit, sich für ein Interdisziplinäres Projekt (IDP) oder eine Forschungspraxis (FP) im Bereich Virtual Reality zu bewerben. Hierfür gibt es verschiedene Themen, welche Sie mit dem Betreuer besprechen können.
Zudem besteht die Möglichkeit, bestimmte Themen in Teamarbeit mit Ihren Kommilitonen zu erforschen.
Aufgaben
  • Themenbezogene Literaturrecherche
  • Implementierung eines Szenarios
  • Planung, Durchführung und Evaluierung einer Nutzerstudie
Voraussetzung
  • Interesse an neuen Technologien und empirischer Forschung
  • Strukturiertes sowie zuverlässiges Arbeiten
  • Grundkenntnisse der objektorientierten Programmierung
Bewerbung Wenn Sie Interesse an einem Thema aus diesem Bereich haben, senden Sie bitte eine E-Mail an die oben angegebene Adresse. Diese sollte Folgendes enthalten: Motivation, bisherige Erfahrung, Starttermin, Lebenslauf und Zeugnis bzw. Transcript of Records.

Sachgebiet: Speech Recognition

Improving Speech Enhancement GAN with SincNet

Topic Improving Speech Enhancement GAN with SincNet
Typ Research Internship (FP), Master's Thesis (MA)
Betreuer Lujun Li, M.Eng.
E-Mail: lujun.li@tum.de
Beschreibung Motivation:
The goal of speech enhancement is to improve the quality and intelligibility of speech which are degraded by background noise. There exists a class of generative methods relying on GANs, which have been demonstrated to be efficient for speech enhancement. Using GANs, speech enhancement has been done using raw waveform input. CNNs are the best candidate for processing raw speech samples; however, one of the most critical parts of current waveform-based CNNs is the first convolutional layer. This layer not only deals with high-dimensional inputs, but it is also more affected by vanishing gradient problems, especially when employing very deep architectures. To tackle this problem, SincNet, a novel convolutional architecture, is proposed, adding some constraints on the filter shape.

Task:
The main task is to replace the CNN architecture of SEGAN [1] with the SincNet architecture [2]. Overall, the following steps will be followed during the thesis:
  1. Running through the existing SEGAN repository
  2. Understanding SincNet architecture
  3. Implement the SincNet architecture in SEGAN
  4. Evaluation of the new architecture on a given dataset

References:
[1] Pascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative adversarial network." arXiv preprint arXiv:1703.09452 (2017).
[2] Ravanelli, Mirco, and Yoshua Bengio. "Speaker recognition from raw waveform with sincnet." 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018.
Requirements
  • Excellent coding skills, preferably in Python.
  • Experience in deep learning frameworks, preferably in Tensorflow.
  • Motivation to work on deep learning.
Application If you are interested in working in the promising field of artificial intelligence and more specifically, speech signal processing, we welcome the applications via the email address above. Please specify the topic in the email subject, e.g. Masterarbeit/Forschungspraxis application for topic ‘XYZ’, while emphasizing your previous project experience and ideal starting date. Please also attach your recent CV and the transcript.

Joint training of speech enhancement and speech recognition

Topic Joint training of speech enhancement and speech recognition
Typ Research Internship (FP), Master's Thesis (MA)
Betreuer Lujun Li, M.Eng.
E-Mail: lujun.li@tum.de
Beschreibung Motivation:
Recently, end-to-end neural networks have made significant breakthroughs in the field of speech recognition, challenging the dominance of DNN-HMM hybrid architectures. However, speech inputs for ASR systems are generally interfered by various background noises and reverberations in realistic environments, leading to the dramatic performance degradation. To alleviate this issue, the mainstream approach is to use a well-designed speech enhancement module as the front-end of ASR. However, enhancement modules would result in speech distortions and mismatches to training, which sometimes degrades the ASR performance. Therefore, integrating the speech enhancement and end-to-end recognition network via jointly training is a promising research field.
Task:
The main task is to improve an already working joint training pipeline, which can be seen below, with state-of-the-art feature extraction methods, speech enhancement algorithms and speech recognition algorithms. Details of the architecture can be found in [1]. As a future reading, [2] also provides a detailed explanation of the integration of speech enhancement and speech recognition.

References:
  1. Liu, Bin & Nie, Shuai & Liang, Shan & Liu, Wen-Ju & Yu, Meng & Chen, Lianwu & Peng, Shouye & Li, Changliang. (2019). Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495. 10.21437/Interspeech.2019-1242.
  2. M. Ravanelli, P. Brakel, M. Omologo and Y. Bengio, "Batch-normalized joint training for DNN-based distant speech recognition," 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, 2016, pp. 28-34, doi: 10.1109/SLT.2016.7846241.

Requirements
  • Excellent coding skills, preferably in Python.
  • Experience in deep learning frameworks, preferably in Torch/PyTorch & Tensorflow.
  • Background knowledge in speech signal processing or natural language processing is a bonus.
  • Motivation to work on deep learning.
Application If you are interested in working in the promising field of artificial intelligence and more specifically, speech signal processing, we welcome the applications via the email address above. Please specify the topic in the email subject, e.g. Masterarbeit/Forschungspraxis application for topic ‘XYZ’, while emphasizing your previous project experience and ideal starting date. Please also attach your recent CV and the transcript.

A New Method to Generate Hidden Markov Model Topology

Thema A New Method to Generate Hidden Markov Model Topology
Typ Masterarbeit, Forschungspraxis
Betreuer Lujun Li, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: lujun.li@tum.de
Beschreibung Motivation : For decades, acoustic models in speech recognition systems pivot on Hidden Markov Models (HMMs), e.g., Gaussian Mixture Model-HMM system and Deep Neural Network-HMM system, and achieve a series of impressive performance. At present, the most widely-employed HMM topology is 3-state left-to-right architecture. However, there is no adamant evidence for its suitability and superiority, reflecting the deficiency of the research into HMM topology. We propose an innovative technique to customize an individual HMM topology for each phoneme, and achieve great results in monophone system. The topic of this thesis is to apply it in triphone system.
Task : The main task is to transfer an already working deep architecture from monophone system to triphone system. Overall, the following steps will be followed during the thesis:
1. State-of-the-art research
2. Understanding the already working deep architecture
3. Implementing the algorithm in triphone system
4. Evaluation of the architecture on Tedliumv2 corpus
5. Demonstration of the working system
Voraussetzung 1. Background knowledge in speech signal processing or natural language processing.
2. Excellent coding skills, preferably in C++ & Python.
3. Experience in deep learning frameworks, preferably in Torch/PyTorch & Tensorflow.
4. Experience in Kaldi toolkits is a big bonus.
5. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “ application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Computer Vision

End-to-End Pose-Based Gait Recognition with Graph Neural Networks

Thema End-to-End Pose-Based Gait Recognition with Graph Neural Networks
Typ Master's Thesis
Betreuer Torben Teepe, M.Sc.
E-Mail: t.teepe@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: Skeleton-based approaches have shown excellent results in understanding human action [1] and recognizing human gait. Current methods require a two-stage architecture (see Fig. 1): 1. Human Pose Estimation, 2. Graph Convolutional Neural Network. In this work, we want to explore the possibilities of a single-stage, graph-based approach. A single-stage architecture can be archived using a keypoint detector that feeds the pose data into a Graph Convolutional Network.

Task: The task is to extend the CenterNet-based [2] architecture with a temporal Graph Convolutional Network for Gait Recognition. The architecture will then be evaluated and refined using standard Gait Recognition datasets. Once a decent performance is archived, the architecture can also be used for Action Recognition, Tracking, or Re-Identification.

References:
[1] Liu, Ziyu, et al. "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
[2] Duan, Kaiwen, et al. "CenterNet: Keypoint triplets for object detection." Proceedings of the IEEE International Conference on Computer Vision. 2019.
Voraussetzung
  • Experience in Computer Vision & Deep Learning
  • Good programming skills, ideally in Python
  • Experience in deep learning frameworks, preferably PyTorch
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Contrastive Learning for Open Set Recognition

Thema Contrastive Learning for Open Set Recognition
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: Contrastive learning have been actively used in unsupervised [1,2] and supervised [3] recognition tasks. A recent application [4] of contrastive learning for anomaly detection proved that it can be used for open set recognition tasks. In this thesis, the student will create contrastive learning algorithm and compare it with state-of-the-art approaches [5,6].

Task:There is already a contrastive learning framework written in PyTorch, which achieves similar results to state-of-the-art open set recognition algorithms. The student will work on this framewok, and advance it to outperform all other state-of-the-art algorithms.

References:
[1] Tian, Yonglong, Dilip Krishnan, and Phillip Isola. "Contrastive multiview coding." arXiv preprint arXiv:1906.05849 (2019).
[2] Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020).
[3] Khosla, Prannay, et al. "Supervised contrastive learning." arXiv preprint arXiv:2004.11362 (2020).
[4] Köpüklü, Okan, et al. "Driver Anomaly Detection: A Dataset and Contrastive Learning Approach." arXiv preprint arXiv:2009.14660 (2020).
[5] Neal, Lawrence, et al. "Open set learning with counterfactual images." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
[6] Oza, Poojan, and Vishal M. Patel. "C2ae: Class conditioned auto-encoder for open-set recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Voraussetzung 1. Excellent coding skills, preferable in Python
2. Experience in deep learning frameworks, preferably in PyTorch
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Video Action Recognition with Transformers

Thema Video Action Recognition with Transformers
Typ Forschungspraxis, Masterarbeit
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kupuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation :Transformers have revolutionized the natural language processing tasks outperforming all other sequence-to-sequence learning algorithms. The paper ‘Attention Is All You Need’ [1] introduces a novel architecture called ‘Transformer’. Recently they have been used in computer vision tasks such as image recognition [2] and gesture recognition [3]. For action recognition task, they can still be used to improve state-of-the-art results.
Task : The student will create a framework in Pytorch to extract frame-level features from video clips and to apply transformer for spatio-temporal modeling. The framework for feature extraction is already available [4]. Created final framework will be evaluated on major action and gesture recognition datasets such as Kinetics and Jester.

References:
[1] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017): 5998-6008.
[2] Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv preprint arXiv:2010.11929 (2020).
[3] D'Eusanio, Andrea, et al. "A Transformer-Based Network for Dynamic Hand Gesture Recognition." International Conference on 3D Vision. 2020.
[4] Köpüklü, Okan, et al. "Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing." arXiv preprint arXiv:2009.14639 (2020).
Voraussetzung 1. Excellent coding skills in Python,
2. Experience in deep learning frameworks, preferably in PyTorch.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “ application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.