# Offered Theses

Please contact the doctoral researchers directly if you are interested in a Bachelor or Master thesis, a student job, an "Ingenieurspraxis" or a "Forschungspraxis". It is also usually possible to find a topic that matches your specific interests. Please include a **curriculum vitae** together with a **list of attended courses** when applying for a thesis. If your "Ingenieurspraxis" is selected to be supervised by one of our professors, please hand in the documents to Doris Dorn (Room N2401).

### Bachelorarbeiten

Joint Random Errors happen on Partial Defect Cells

## Joint Random Errors happen on Partial Defect Cells

**Stichworte: **

Linear Codes, Algebraic Codes, Error Correction , Masking, defects, non-volatile memories, phase-change memories,PCMs, sagemath

**Kurzbeschreibung: **

So far, all investigated error correction for memory with partial defects works have been done for "disjoint" error such that we suppose the error cannot happen in a position where the memory already partially defects. In this thesis, we want to consider if a joint error happens what are the code constructions for a different number of partially stuck (defect) memory cells up to the code length.

#### Beschreibung

The demand of reliable memory solutions and in particular for non-volatile memories such as phase-change memories (PCMs) for different applications is steadily increasing. The key characteristic of PCM cells is that they can switch between two main states: an amorphous state and a crystalline state. PCM cells may become defect (also called stuck) if they fail in switching their states. This occasionally happens due to the cooling and healing processes of the cells, and therefore cells can only hold a single phase. There are two scenarios of such a defect. Classical stuck cells, i.e. the cell level cannot switch up or down its current level. The other more flexible scenario is partially stuck cells, here, the cell can increase its level but cannot go down its current one.

So far, error-correcting code constructions for defect memory with additional errors have been done. Recent works focus only on the additional error that happens in a position in which the cell assumed is reliable (non-stuck).

In this thesis, we want to investigate the case where error and partially defect cells happen in the same position. Further, a mix-mode of joint and disjoint error could also be considered in this thesis.

#### Voraussetzungen

- Good knowledge of Linear Algebra

- Good knowledge of Channel Coding/Coding Theory

- Basic knowledge in Information Theory

- Programming (Python -- > preferably Sagemath)

#### Kontakt

M.Eng. Haider Al Kim

Doctoral Researcher

Technical University of Munich

Department of Electrical and Computer Engineering /

Coding for Communications and Data Storage (COD) Group

Building N4, Room 3417

Theresienstr. 90, 80333 München

Email: haider.alkim@tum.de

Phone: +49 8928929055

https://www.lnt.ei.tum.de/en/people/doctoral-researchers/alkim/

#### Betreuer:

Deterministic Identification Over the Binary Symmetric Channel

## Deterministic Identification Over the Binary Symmetric Channel

**Stichworte: **

Identification, Identification Without Randomization, Complexity of Distributive Computing, One-Way/Two-Way Probabilistic Communications

**Kurzbeschreibung: **

Identification without randomization over the binary symmetric channel (BSC) would be studied and investigated. Method of coding, decoding sets, type I/II errors and a closed-form solution of non-randomized (deterministic) identification capacity will be considered and analysed.

#### Beschreibung

In theory of Identification introduced by Ahlswede and Dueck the perspective of communications is changed from decoding to identification, i.e., the receiver is only interested to check whether or not his message was sent by transmitter. Ahslwede and Dueck proved that by means of local randomness (at encoder) they can achieve doubly exponential gain in code size which outperfrom substantially the classical scheme of Shannon transmission. (only exponential gain in code size). However, in many cases, there is no need to exploit the Randomization at encoder and thus the view of identification without randomization is introduced.

On the other hand to derive probabilistic complexity of models for one/two way communications for distributive computing (communication between two processors for determine copperatively value of an identification function) similar ideas were discussed. The student should understand those ideas and try to find the link between them and idea of Identification of Ahlswede and Dueck. Ideally the student attempt to derive forward or/and backward proof for non-randomized identification capacity of the Binary Symmetric Channel (BSC).

#### Voraussetzungen

- Familiarities with fundamental concepts of Information Theory, such as Entropy, Mutual Information, Hamming Distance, Rate, Probability of Error, The Binary Symmetric Channel etc)
- Familiarity with fundamentals of Identification Theroy

#### Betreuer:

### Masterarbeiten

List decoding of Goppa Codes

## List decoding of Goppa Codes

**Stichworte: **

decoding; implementation

#### Beschreibung

The goal is to implement the list decoding algorithm in [1].

Fundalmentals on Goppa codes can be found in [2, Chap 12.3], [3, Chap 2].

Fundalmentals on list decoding can be found in [4, Chap 12], [5, Chap 9].

References:

[1] D. Augot, M. Barbier, and A. Couvreur, “List-decoding of binary goppa codes up to the binary johnson bound,” in 2011 IEEE Information Theory Workshop, pp. 229–233, Oct 2011.

[2] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes. 1978.

[3] H. Liu, “Decoding of interleaved goppa codes and their applications in code-based cryptosystem,” Master’s thesis, Technical University of Munich, Dec. 2018.

[4] J. Justesen and T. Høholdt, A course in error-correcting codes. Z¨urich: European Mathematical Society (EMS), 2004.

[5] R. M. Roth, Introduction to Coding Theory. Cambridge University Press, 2006.

#### Voraussetzungen

- Channel Coding
- Linear Algebra
- Programming

#### Betreuer:

Efficient Implementation of Decryption Algorithms in Rank-Based Cryptography

## Efficient Implementation of Decryption Algorithms in Rank-Based Cryptography

#### Beschreibung

Many Rank-based cryptosystems require decoding of Gabidulin codes in their decryption algorithm. In this work, the student should compare the theoretical complexity of different Gabidulin code decoders. Based on the theoretical complexity analysis, the students should decide on the most promising decoding algorithms. Then, the algorithms should be implemented in C and their performance should be compared on a microcontroller.

#### Voraussetzungen

- Good knowledge about rank-metric codes (e.g. by taking the course Coding Theory for Storage and Networks)

- Good knowledge about cryptography (e.g. by taking the course Security in Communications and Storage)

- Very good knowledge of the Programming language C

#### Betreuer:

The coin weighing problem

## The coin weighing problem

#### Beschreibung

The question of finding a small subset of defective coins from a set of regular coins in the fewest number of weighings has been a notorious problem.

Suppose there is a collection of *n* coins so that some of them are defective. In other words, we know that the weight of regular coins is *A*, and the weight of the remaining coins is *B*, where integers *A* and *B* are given. The problem is to determine the weight of each coin by weighing subsets of coins in a spring scale. The main figure of merit when studying adaptive coin weighing algorithms is the number of required weighings in the worst-case and in the average-case. In this project, we design and implement an efficient algorithm which works for two coins.

#### Voraussetzungen

Coding theory, combinatorics

Good programming skills

#### Kontakt

#### Betreuer:

Error Correction in DNA Storage

## Error Correction in DNA Storage

**Stichworte: **

DNA storage, Error Correction, Deletion, Insertion, Substitutions

#### Beschreibung

DNA storage is an uprising topic in the research field of storage systems. Due its natural longetivity, robustness, and density properties the main application would arise in high-dense long-term storage systems. The interest has become larger and larger due the large amount of data nowadays and the relative new biological advances in DNA synthesis and sequencing processes (e.g. polymerase chain reaction). In contrary to conventional storing methods, due to the nature of DNA and the involved biological processes special error patterns such as insertion, deletion, and substitution errors occur. To tackle these errors novel methods for correction have to be investigated. Moreover, the model of the DNA storage channel needs to be investigated thorougly, e.g. capacity statements.

#### Voraussetzungen

- Linear Algebra
- Channel Coding
- Coding Theory for Storage and Networks (optional)

#### Betreuer:

Private, Secure and Flexible Distributed Machine Learning on the Cloud

## Private, Secure and Flexible Distributed Machine Learning on the Cloud

**Stichworte: **

Distributed machine learning, straggler mitigation, information theoretic privacy and security

**Kurzbeschreibung: **

This project aims at finding coding techniques for implementing machine learning algorithms on untrusted devices on the cloud. A key challenge is to adapt to cloud devices, e.g., IoT, is that have different compute power that change with time.

#### Beschreibung

In the era of big data, running learning algorithms on single computing is becoming a bottleneck. Therefore the need of distributing the computation is inevitable. However, the deployment of distributed computing introduces new challenges that, if ignored, may outweigh the benefit of parallelism.

We consider the master-worker topology in which the master needs to run a computation on its data. The master breaks the computation into smaller tasks distributed to processing nodes referred to as workers. The workers run the tasks in parallel. The master combines the results sent back from the workers to obtain its original computation.

However, heterogeneity of the workers' computation power and/or network link properties can slow down the process for some workers. Waiting for the slowest worker is shown to outweigh the effect of parallelism. Moreover, the privacy of the data is concerned since it is shared with external workers. The leakage of the master data can be harmful or even illegal. The master also risks employing an adversarial computing node as a worker, whose goal is to corrupt the whole computation.

Recently, coding theoretic techniques have been used to speed up the computation, guarantee the privacy and security of the distributed computing paradigm under different settings. The setting of interest for this project is one where the workers have different time-varying compute powers. Thus a flexible coding technique is needed. We focus on matrix-matrix multiplication as a building block in several machine learning algorithms.

In this project, we would like to design codes that allow us to overcome all the three challenges for matrix-matrix multiplication (or more types of computation if the time allows). We start by building on the work in https://arxiv.org/abs/2004.12925. The main focus of the project is on the security of the scheme, i.e., robustness against malicious workers trying to corrupt the computation. We aim to implement the designed codes on Google cloud platform, or Amazon Web Services to test their practicality.

#### Voraussetzungen

Coding Theory

Information Theory

Linear Algebra

Probability Theory

Programming skills

Self-motivation

#### Kontakt

Marvin Xhemrishi: marvin.xhemrishi@tum.de

Rawad Bitar: rawad.bitar@tum.de

#### Betreuer:

How to guess an n-digit number

## How to guess an n-digit number

#### Beschreibung

In a deductive game for two players, Alice and Bob, Alice conceals an *n*-digit number *x*, and Bob, who knows *n*, tries to identify *x* by asking a number of questions, which are answered by Alice. Each question is an *n*-digit number *y*, and each answer is the number of positions at which the corresponding symbols of *x* and *y* are different. Moreover, we require Bob to ask all questions at once. The goal of this project is to design and implement an efficient coding scheme for Bob with an asymptotically optimal number of questions.

#### Voraussetzungen

Coding theory, combinatorics, analysis

Good programming skills

#### Kontakt

#### Betreuer:

DNA Storage Channel Modeling and Error Correction

## DNA Storage Channel Modeling and Error Correction

#### Beschreibung

DNA-based data storage is a novel approach for long term data archiving.

Due to the unique nature of writing and reading DNA, the channel associated with these processes is still realtively poor understood and varies over different synthesis (writing) and sequencing (reading) technologies. The task of the student is to analyze different sequencing methods and the associated errors and formulate associated channel models. Based on these models, error-correcting schemes shall be evaluated.

#### Voraussetzungen

- Basic principles of stochastic and algebra

- Channel Coding

#### Betreuer:

Algebraic Coding for Distributed Storage

## Algebraic Coding for Distributed Storage

#### Beschreibung

Distributed storage systems with a large number of storage commonly rely on MDS codes, such as Reed-Solomon codes, to protect the system from data loss in the event of node failures while keeping the storage overhead low. In the event of such a failure, the reconstruction of failed nodes induces a large amount of network traffic. In recent years several solutions to this problem have been proposed, most notably locally repairable codes and regenerating codes. We investigate the properties of specific subclasses of these codes, as well as the combination of the two properties.

Further, with the increased demand for data privacy, we develop methods for protecting users' data from the eyes of curious servers in distributed storage systems.

#### Betreuer:

Homomorphic encryption

## Homomorphic encryption

**Stichworte: **

Cryptography

#### Beschreibung

Consider that a client would like to a server to do some computations for him but he does not want to give information meaningful information to the server. The client therefore sends encrypted messages c1 = Enc(pk, m1) and c2 = Enc(pk, m2) to the server and the client would like to obtain some function f of the two plaintexts f(m1,m2). It suffices for the client to get Enc(pk, f(m1,m2)) because the client owns the secret key sk. He is able to use the decryption function Dec on the ciphertext and gets Dec(sk, Enc(pk,f(m1,m2=))) = f(m1,m2).

The goal of this internship is to analyze schemes that achieve this property based on code-based cryptography.

#### Voraussetzungen

linear algebra

coding theory

basic understanding of cryptography

#### Betreuer:

Adaptive compute strategies for distributed learning

## Adaptive compute strategies for distributed learning

**Stichworte: **

Distributed machine learning, straggler mitigation, adaptive worker selection, adaptive data allocation

**Kurzbeschreibung: **

We consider the master/server computing model. The goal of this project is to design coding techniques that adaptively assigns computational tasks to the workers. In addition, the technique allows the master node to adaptively choose the number of workers it waits for at every iteration. Both assignment strategy and waiting strategy aim to minimize the time spent on running the machine learning algorithm.

#### Beschreibung

We consider the master/server computing model. A master possesses a large amount of data and wants to run a machine learning algorithm on this data. The master offloads the data to several worker machines to parallelize the computation and reduce the time it takes. If the master waits for all the workers, he is limited by the slowest worker. This is known as the straggler problem.

To mitigate the straggler problem, the master can distribute the data with redundancy to the workers. However, adding redundancy to the workers increases the computing time. On the other hand, stochastic gradient descent is known to converge even if the master does not wait for all the workers per iteration. Therefore, a tradeoff between redundancy, and convergence speed arises.

It is shown that for convex loss functions, the master can assign data without redundancy to the workers and chooses the number of workers to wait for per iteration to minimize the waiting time.

The goal of this project is to have the best of both worlds. More precisely, we want to design coding techniques that potentially change the redundancy of the data in an adaptive manner. In addition, the technique allows the master to adaptively choose the number of workers it waits for at every iteration. Both assignment strategy and waiting strategy aim to minimize the time spent on running the machine learning algorithm.

The coding technique we aim to introduce is tailored to the problem at hand. If communication between the master and workers is fast, e.g., serverless compute service, then changing the redundancy is affordable. Otherwise, in settings such as internet of things and edge computing, the master may want to fix the redundancy a priori to a designed value and choose to adaptively wait for a different number of workers per iterations.

It is expected to program a distributed stochastic gradient descent on Google cloud platform, or Amazon Web Services to validate and/or design our system's parameters.

#### Voraussetzungen

Knowledge of coding theory, linear algebra and machine learning

Good programming skills

Self-motivated and eager to learn

Bonus: knowledge of probability theory and convex optimization

#### Kontakt

rawad.bitar@tum.de

https://sites.google.com/site/rawadbitar1

#### Betreuer:

Distributed learning on serverless compute services

## Distributed learning on serverless compute services

**Stichworte: **

Distributed machine learning, straggler mitigation, neural networks, coding theory

**Kurzbeschreibung: **

The goal of this project is to implement a neural network on thousands of serverless computing instances. We start by evaluating two existing coding techniques designed to scale stochastic gradient descent for convex loss functions. We then deviate to devise a coding scheme tailored for distributed neural network that scale for convex and non convex loss functions.

#### Beschreibung

Applying machine learning on large data became part of daily life. Scalability of distributed machine learning algorithms requires tolerance of slower computing nodes referred to as stragglers. To mitigate the effect of stragglers, one can add redundancy to the distributed data. However, stochastic gradient descent (SGD) is known to perform well even when computing on a part of the data. It is shown that for convex loss functions, a small redundancy or no redundancy at all is enough to guarantee good performance of SGD.

In this project, we evaluate the existent coding techniques requiring small redundancy on serverless compute functions offered by Amazon Web Services, or equivalently cloud functions offered by Google Computing Platform. Our goal is to devise a coding technique that goes beyond SGD and can be used for non-convex loss functions. The main targeted application is a full implementation of a neural network on serverless functions with straggler tolerance.

#### Voraussetzungen

Knowledge of machine learning algorithms, e.g., linear regression, logistic regression, neural network

Good programming skills

Self-motivation and dedication

Plus: knowledge of probability, statistical pattern recognition, coding theory and convex optimization

#### Kontakt

rawad.bitar@tum.de

https://sites.google.com/site/rawadbitar1

#### Betreuer:

Joint Random Errors happen on Partial Defect Cells

## Joint Random Errors happen on Partial Defect Cells

**Stichworte: **

Linear Codes, Algebraic Codes, Error Correction , Masking, defects, non-volatile memories, phase-change memories,PCMs, sagemath

**Kurzbeschreibung: **

So far, all investigated error correction for memory with partial defects works have been done for "disjoint" error such that we suppose the error cannot happen in a position where the memory already partially defects. In this thesis, we want to consider if a joint error happens what are the code constructions for a different number of partially stuck (defect) memory cells up to the code length.

#### Beschreibung

The demand of reliable memory solutions and in particular for non-volatile memories such as phase-change memories (PCMs) for different applications is steadily increasing. The key characteristic of PCM cells is that they can switch between two main states: an amorphous state and a crystalline state. PCM cells may become defect (also called stuck) if they fail in switching their states. This occasionally happens due to the cooling and healing processes of the cells, and therefore cells can only hold a single phase. There are two scenarios of such a defect. Classical stuck cells, i.e. the cell level cannot switch up or down its current level. The other more flexible scenario is partially stuck cells, here, the cell can increase its level but cannot go down its current one.

So far, error-correcting code constructions for defect memory with additional errors have been done. Recent works focus only on the additional error that happens in a position in which the cell assumed is reliable (non-stuck).

In this thesis, we want to investigate the case where error and partially defect cells happen in the same position. Further, a mix-mode of joint and disjoint error could also be considered in this thesis.

#### Voraussetzungen

- Good knowledge of Linear Algebra

- Good knowledge of Channel Coding/Coding Theory

- Basic knowledge in Information Theory

- Programming (Python -- > preferably Sagemath)

#### Kontakt

M.Eng. Haider Al Kim

Doctoral Researcher

Technical University of Munich

Department of Electrical and Computer Engineering /

Coding for Communications and Data Storage (COD) Group

Building N4, Room 3417

Theresienstr. 90, 80333 München

Email: haider.alkim@tum.de

Phone: +49 8928929055

https://www.lnt.ei.tum.de/en/people/doctoral-researchers/alkim/

#### Betreuer:

Simulation and performance improvement of identification codes

## Simulation and performance improvement of identification codes

#### Beschreibung

Identification is a communication scheme that allows rate doubly exponential in the blocklemght, with the tradeoff that identities cannot be decoded (as messages do) but can only be verified.

The double exponential growth presents various challenges in the finite regime: there are heavy computational costs introduced at the encoder and decoder and heavy trade-offs between the error and the codes sizes.

The ultimate goal is to find a fast, reliable implementation while still achieving large code sizes.

Your task will be implementing and testing new ideas toward this goal.

The coding will be in Matlab. Some existing code needs conversion from Sagemath to Matlab.

This work can accomodate multiple students.

The working language will be in English.

Environment: we collaborate with LTI. At LNT and LTI there is currently a lot of funding for research in identification. Therefore you will find a large group of people that might be available for discussion and collaboration.

#### Voraussetzungen

Nachrichtentechnik 2

#### Betreuer:

Group Testing and Information Theory

## Group Testing and Information Theory

**Stichworte: **

Group Testing

**Kurzbeschreibung: **

Use ideas of Information Theory to get new group testing strategies.

#### Beschreibung

The concept of group testing (also called pooled testing or pooling) dates back to mathematical ideas for improving the efficiency of syphilis tests developed by Dorfman in 1943 [Dor43].The idea of Dorfman’s method is to combine portions of k different individual blood samples into onesample in a first stage. If it tested negative then that entire group could be dismissed without further testing. Then, separately retesting the samples of individuals from positive pools in a second stage. Forlow to moderate infection rates, this strategy has a high throughput since most of the group, when chosen wisely, will be declared negative. Naturally, the efficiency of pooling strategies for the current pandemichas been shown to depend on the prevalence of SARS-CoV-2, patient-pool size and test sensitivity. While the sensitivity needs to be understood on an experimental level, the prevalence needs to be estimated from available test results and the pool size should be optimized with Information Theory models before widespread implementation.

#### Voraussetzungen

Information Theory

#### Betreuer:

Two-sided search

## Two-sided search

#### Beschreibung

In the most studied models in the literature, it is assumed that the target of the search is either stationary with its hidden position being chosen according to someknown distribution, or it is moving and its movements follow some known rules. In such cases, we talk about one-sided search, meaning that the target’s behaviour is somehow independent of the searcher’s attempt to catch it. Conversely, if the target can attempt to contrast the searcher’s activity and react in some intelligent way in order not to be found, the model is referred to as two-sided search. Two-sided search was introduced by Koopman. The goal is to implement a two-sided search algorithm.

#### Betreuer:

Learning Aided SC Flip Decoding for Polar Codes

## Learning Aided SC Flip Decoding for Polar Codes

#### Beschreibung

Polar codes achieve the capacity of binary-input discrete memoryless channels asymptotically in the block length under successive cancellation (SC) decoding. Polar codes have been adopted for the control channel in 5G enhanced mobile broadband (eMBB).

Due to the serial nature of SC decoding, an erroneous bit decision can be caused by the channel noise or previous erroneous bit estimates. The main idea of SC flip decoding is trying to correct the first erroneous bit decision by sequentially flipping the unreliable decisions.

The optimal flipping strategy is considered difficult due to lack of an analytical solution. Alternatively, (deep) learning aided SC flip algorithm are investigated in this thesis.

[1] O. Afisiadis, A. Balatsoukas-Stimming, and A. Burg, “A low-complexity improved successive cancellation decoder for polar codes,” in Proc. 48th Asilomar Conf. Signals, Systems and Computers, pp. 2116-2120, 2014.

[2] L. Chandesris, V. Savin, and D. Declercq, “Dynamic-SCFlip Decoding of Polar Codes,” IEEE Trans. Commun., vol. 66, no. 6, pp. 2333-2345, Jun., 2018.

[3] X. Wang, et al. "Learning to Flip Successive Cancellation Decoding of Polar Codes with LSTM Networks." *arXiv preprint arXiv:1902.08394* (2019).

[4] N. Doan, et al. "Neural Dynamic Successive Cancellation Flip Decoding of Polar Codes." *arXiv preprint arXiv:1907.11563* (2019).

#### Betreuer:

Polar Coding with Non-Binary Kernels

## Polar Coding with Non-Binary Kernels

#### Beschreibung

This thesis will focus on polar codes with non-binary kernels on GF(q). Some of the following tasks might be covered:

- Kernel selection
- Decoder implementation
- Efficient construction
- Comparison of binary and non-binary polar codes

#### Voraussetzungen

- Channel Coding
- Information Theory
- Matlab/C++

#### Betreuer:

Adaptive List Decoding for Polar Codes

## Adaptive List Decoding for Polar Codes

#### Beschreibung

The finite-length performance of polar codes can be improved by using successive cancellation list decoding. In this thesis, decoder design/implementation and performance prediction are investigated.

Reference:

[1] I. Tal and A. Vardy. "List decoding of polar codes." IEEE Transactions on Information Theory 61.5 (2015): 2213-2226.

[2] P. Yuan and F. Steiner. "Efficient Decoding Algorithms for Polar Codes based on 2×2 Non-Binary Kernels." International Symposium on Turbo Codes & Iterative Information Processing. 2018.

#### Voraussetzungen

- Information Theory
- Channel Coding
- Channel Codes for Iterative Decoding
- Matlab/C++

#### Betreuer:

### Forschungspraxis oder MSCE Forschungspraxis

Index Coding & Coded Caching

## Index Coding & Coded Caching

#### Beschreibung

Coded caching problem has two phases, placement phase and delivery phase.

For a fixed placement scheme, designing the delivery protocol is equivalent to the index coding problem.

Transforming a coded caching problem with coded placement to the corresponding index coding problem is a hignly-interested topic and what is the optimal delivery scheme for a coded caching problem with coded placement is still an open problem.

References:

[1] N. S. Karat, S. Samuel, and B. S. Rajan. Optimal error correcting index codes for some generalized index coding problems. IEEE Transactions on Communications, 67(2):929–942, 2019.

[2] Z. Chen, P. Fan, and K. B. Letaief. Fundamental limits of caching: improved bounds for users with small buffers. IET Communications, 10(17):2315–2318, 2016.

#### Voraussetzungen

- Linear Algebra
- Channel Coding (recommended)

#### Betreuer:

Error Correction in DNA Storage

## Error Correction in DNA Storage

**Stichworte: **

DNA storage, Error Correction, Deletion, Insertion, Substitutions

#### Beschreibung

DNA storage is an uprising topic in the research field of storage systems. Due its natural longetivity, robustness, and density properties the main application would arise in high-dense long-term storage systems. The interest has become larger and larger due the large amount of data nowadays and the relative new biological advances in DNA synthesis and sequencing processes (e.g. polymerase chain reaction). In contrary to conventional storing methods, due to the nature of DNA and the involved biological processes special error patterns such as insertion, deletion, and substitution errors occur. To tackle these errors novel methods for correction have to be investigated. Moreover, the model of the DNA storage channel needs to be investigated thorougly, e.g. capacity statements.

#### Voraussetzungen

- Linear Algebra
- Channel Coding
- Coding Theory for Storage and Networks (optional)

#### Betreuer:

Private, Secure and Flexible Distributed Machine Learning on the Cloud

## Private, Secure and Flexible Distributed Machine Learning on the Cloud

**Stichworte: **

Distributed machine learning, straggler mitigation, information theoretic privacy and security

**Kurzbeschreibung: **

This project aims at finding coding techniques for implementing machine learning algorithms on untrusted devices on the cloud. A key challenge is to adapt to cloud devices, e.g., IoT, is that have different compute power that change with time.

#### Beschreibung

In the era of big data, running learning algorithms on single computing is becoming a bottleneck. Therefore the need of distributing the computation is inevitable. However, the deployment of distributed computing introduces new challenges that, if ignored, may outweigh the benefit of parallelism.

We consider the master-worker topology in which the master needs to run a computation on its data. The master breaks the computation into smaller tasks distributed to processing nodes referred to as workers. The workers run the tasks in parallel. The master combines the results sent back from the workers to obtain its original computation.

However, heterogeneity of the workers' computation power and/or network link properties can slow down the process for some workers. Waiting for the slowest worker is shown to outweigh the effect of parallelism. Moreover, the privacy of the data is concerned since it is shared with external workers. The leakage of the master data can be harmful or even illegal. The master also risks employing an adversarial computing node as a worker, whose goal is to corrupt the whole computation.

Recently, coding theoretic techniques have been used to speed up the computation, guarantee the privacy and security of the distributed computing paradigm under different settings. The setting of interest for this project is one where the workers have different time-varying compute powers. Thus a flexible coding technique is needed. We focus on matrix-matrix multiplication as a building block in several machine learning algorithms.

In this project, we would like to design codes that allow us to overcome all the three challenges for matrix-matrix multiplication (or more types of computation if the time allows). We start by building on the work in https://arxiv.org/abs/2004.12925. The main focus of the project is on the security of the scheme, i.e., robustness against malicious workers trying to corrupt the computation. We aim to implement the designed codes on Google cloud platform, or Amazon Web Services to test their practicality.

#### Voraussetzungen

Coding Theory

Information Theory

Linear Algebra

Probability Theory

Programming skills

Self-motivation

#### Kontakt

Marvin Xhemrishi: marvin.xhemrishi@tum.de

Rawad Bitar: rawad.bitar@tum.de

#### Betreuer:

Homomorphic encryption

## Homomorphic encryption

**Stichworte: **

Cryptography

#### Beschreibung

Consider that a client would like to a server to do some computations for him but he does not want to give information meaningful information to the server. The client therefore sends encrypted messages c1 = Enc(pk, m1) and c2 = Enc(pk, m2) to the server and the client would like to obtain some function f of the two plaintexts f(m1,m2). It suffices for the client to get Enc(pk, f(m1,m2)) because the client owns the secret key sk. He is able to use the decryption function Dec on the ciphertext and gets Dec(sk, Enc(pk,f(m1,m2=))) = f(m1,m2).

The goal of this internship is to analyze schemes that achieve this property based on code-based cryptography.

#### Voraussetzungen

linear algebra

coding theory

basic understanding of cryptography

#### Betreuer:

Subpacketization in Coded Caching

## Subpacketization in Coded Caching

**Stichworte: **

Coded caching; Subpacketization

#### Beschreibung

In some communication systems end users are equiped with storages, the communication load during the peak hours can be reduced by having users pre-fetch part of the content during the scilent hours. Coded caching is a study to design the pre-fetching without the knowledge of users demands and the delivery scheme based on the users caches. However, many current schemes are facing the subpacketization problem in order to achieve the optimal load.

#### Voraussetzungen

- Basic programming

#### Betreuer:

Coded Placement in Coded Caching

## Coded Placement in Coded Caching

**Stichworte: **

Coded caching

#### Beschreibung

Coded caching is a study to reduce the transmission delay in broad-/multi-cast by designing the cached data and the transmitting data with coding strategies. There are two phases in coded cahing schemes, placement phase and delivery phase. Coding scheme can be applied in both phase in order to gain in reducing the transmission delay.

#### Voraussetzungen

- Linear Algebra
- Basic Programming
- Channel Coding (Recommended)

#### Kontakt

Hedongliang Liu Doctoral Researcher Technical University of Munich Department of Electrical and Computer Engineering Coding for Communications and Data Storage (COD) Group Theresienstrasse 90 Building N4, Room 3415A D-80333 Munich Phone: +49 89 289 29062 lia.liu@tum.de http://www.lnt.ei.tum.de/en/people/doctoral-researchers/liu/

#### Betreuer:

Group Testing and Information Theory

## Group Testing and Information Theory

**Stichworte: **

Group Testing

**Kurzbeschreibung: **

Use ideas of Information Theory to get new group testing strategies.

#### Beschreibung

The concept of group testing (also called pooled testing or pooling) dates back to mathematical ideas for improving the efficiency of syphilis tests developed by Dorfman in 1943 [Dor43].The idea of Dorfman’s method is to combine portions of k different individual blood samples into onesample in a first stage. If it tested negative then that entire group could be dismissed without further testing. Then, separately retesting the samples of individuals from positive pools in a second stage. Forlow to moderate infection rates, this strategy has a high throughput since most of the group, when chosen wisely, will be declared negative. Naturally, the efficiency of pooling strategies for the current pandemichas been shown to depend on the prevalence of SARS-CoV-2, patient-pool size and test sensitivity. While the sensitivity needs to be understood on an experimental level, the prevalence needs to be estimated from available test results and the pool size should be optimized with Information Theory models before widespread implementation.

#### Voraussetzungen

Information Theory

#### Betreuer:

Two-sided search

## Two-sided search

#### Beschreibung

In the most studied models in the literature, it is assumed that the target of the search is either stationary with its hidden position being chosen according to someknown distribution, or it is moving and its movements follow some known rules. In such cases, we talk about one-sided search, meaning that the target’s behaviour is somehow independent of the searcher’s attempt to catch it. Conversely, if the target can attempt to contrast the searcher’s activity and react in some intelligent way in order not to be found, the model is referred to as two-sided search. Two-sided search was introduced by Koopman. The goal is to implement a two-sided search algorithm.

#### Betreuer:

Simmulation eines Quantenrepeaters

## Simmulation eines Quantenrepeaters

**Stichworte: **

Simmulation, Quantenrepeater

**Kurzbeschreibung: **

Ein Quantenrepeater ist wichtiger Baustein für Quantenkommunikation über große Distanzen. Hier soll ein Quantenrepeater der 1. Generation simmuliert werden.

#### Beschreibung

Im Gegensatz zu klassischen Signalen ist bei einem quantenmechanischen Zustand das Verstärken oder Kopieren des Signals nicht möglich, Der Quantenrepeater (1. Generation) basiert auf Verschränkung, und der Destillation. Die einzelnen Komponenten des Repeaters können unterschiedlich realisiert werden. Ziel der Simmulation ist es Vor- und Nachteile der verschiedenen möglichen Komponenten heraus zu finden.

#### Voraussetzungen

Grundwissen in Quanteninformationstheorie

#### Betreuer:

Learning Aided SC Flip Decoding for Polar Codes

## Learning Aided SC Flip Decoding for Polar Codes

#### Beschreibung

Polar codes achieve the capacity of binary-input discrete memoryless channels asymptotically in the block length under successive cancellation (SC) decoding. Polar codes have been adopted for the control channel in 5G enhanced mobile broadband (eMBB).

Due to the serial nature of SC decoding, an erroneous bit decision can be caused by the channel noise or previous erroneous bit estimates. The main idea of SC flip decoding is trying to correct the first erroneous bit decision by sequentially flipping the unreliable decisions.

The optimal flipping strategy is considered difficult due to lack of an analytical solution. Alternatively, (deep) learning aided SC flip algorithm are investigated in this thesis.

[1] O. Afisiadis, A. Balatsoukas-Stimming, and A. Burg, “A low-complexity improved successive cancellation decoder for polar codes,” in Proc. 48th Asilomar Conf. Signals, Systems and Computers, pp. 2116-2120, 2014.

[2] L. Chandesris, V. Savin, and D. Declercq, “Dynamic-SCFlip Decoding of Polar Codes,” IEEE Trans. Commun., vol. 66, no. 6, pp. 2333-2345, Jun., 2018.

[3] X. Wang, et al. "Learning to Flip Successive Cancellation Decoding of Polar Codes with LSTM Networks." *arXiv preprint arXiv:1902.08394* (2019).

[4] N. Doan, et al. "Neural Dynamic Successive Cancellation Flip Decoding of Polar Codes." *arXiv preprint arXiv:1907.11563* (2019).

#### Betreuer:

Polar Coding with Non-Binary Kernels

## Polar Coding with Non-Binary Kernels

#### Beschreibung

This thesis will focus on polar codes with non-binary kernels on GF(q). Some of the following tasks might be covered:

- Kernel selection
- Decoder implementation
- Efficient construction
- Comparison of binary and non-binary polar codes

#### Voraussetzungen

- Channel Coding
- Information Theory
- Matlab/C++

#### Betreuer:

Adaptive List Decoding for Polar Codes

## Adaptive List Decoding for Polar Codes

#### Beschreibung

The finite-length performance of polar codes can be improved by using successive cancellation list decoding. In this thesis, decoder design/implementation and performance prediction are investigated.

Reference:

[1] I. Tal and A. Vardy. "List decoding of polar codes." IEEE Transactions on Information Theory 61.5 (2015): 2213-2226.

[2] P. Yuan and F. Steiner. "Efficient Decoding Algorithms for Polar Codes based on 2×2 Non-Binary Kernels." International Symposium on Turbo Codes & Iterative Information Processing. 2018.

#### Voraussetzungen

- Information Theory
- Channel Coding
- Channel Codes for Iterative Decoding
- Matlab/C++

#### Betreuer:

### Ingenieurpraxis

Subpacketization in Coded Caching

## Subpacketization in Coded Caching

**Stichworte: **

Coded caching; Subpacketization

#### Beschreibung

In some communication systems end users are equiped with storages, the communication load during the peak hours can be reduced by having users pre-fetch part of the content during the scilent hours. Coded caching is a study to design the pre-fetching without the knowledge of users demands and the delivery scheme based on the users caches. However, many current schemes are facing the subpacketization problem in order to achieve the optimal load.

#### Voraussetzungen

- Basic programming