### Dr. Rawad Bitar

Technical University of Munich

Associate Professorship of Coding and Cryptography (Prof. Wachter-Zeh)

###### Postal address

Postal:

Theresienstr. 90

80333 München

- Phone: +49 (89) 289 - 23423
- Office hours: 10:00 to 16:00
- Room: 0104.03.403
- rawad.bitar@tum.de

#### Available Theses

Analysis of Criss-Cross Deletion in Arrays

## Analysis of Criss-Cross Deletion in Arrays

#### Description

yle="font-weight: normal;" lang="de-DE" align="left">In this project we study two dimensional deletion/insertion problem introduced in https://arxiv.org/pdf/2004.14740.pdf. An array of n rows and n columns is transmitted through a specific deletion channel which for example occurs in DNA storage systems and racetrack memories. After transmission the receiver observes an array of dimension (n-1)*(n-1). The goal is to allow the receiver to know exactly which n*n array was transmitted by using error-correction techniques.

The goal of the project is to derive bounds on the deletion ball of any two-dimensional array. A deletion ball of an array X is the set of unique arrays resulting from the deletion of any possible combination of a column and a row in X. Deriving such bounds is not trivial since it depends heavily on the structure of the array. Thus, it is important to investigate useful characteristics representing the structure of an array for this particular setting. Furthermore, having this bound will be helpful to characterize the optimal redundancy of a deletion-correcting code for this setting.

#### Supervisor:

Distributed learning on serverless compute services

## Distributed learning on serverless compute services

**Keywords: **

Distributed machine learning, straggler mitigation, neural networks, coding theory

**Short Description: **

The goal of this project is to implement a neural network on thousands of serverless computing instances. We start by evaluating two existing coding techniques designed to scale stochastic gradient descent for convex loss functions. We then deviate to devise a coding scheme tailored for distributed neural network that scale for convex and non convex loss functions.

#### Description

Applying machine learning on large data became part of daily life. Scalability of distributed machine learning algorithms requires tolerance of slower computing nodes referred to as stragglers. To mitigate the effect of stragglers, one can add redundancy to the distributed data. However, stochastic gradient descent (SGD) is known to perform well even when computing on a part of the data. It is shown that for convex loss functions, a small redundancy or no redundancy at all is enough to guarantee good performance of SGD.

In this project, we evaluate the existent coding techniques requiring small redundancy on serverless compute functions offered by Amazon Web Services, or equivalently cloud functions offered by Google Computing Platform. Our goal is to devise a coding technique that goes beyond SGD and can be used for non-convex loss functions. The main targeted application is a full implementation of a neural network on serverless functions with straggler tolerance.

#### Prerequisites

Knowledge of machine learning algorithms, e.g., linear regression, logistic regression, neural network

Good programming skills

Self-motivation and dedication

Plus: knowledge of probability, statistical pattern recognition, coding theory and convex optimization

#### Contact

rawad.bitar@tum.de

https://sites.google.com/site/rawadbitar1

#### Supervisor:

#### Theses in Progress

Coding for Distributed Coordinate Gradient Descent

## Coding for Distributed Coordinate Gradient Descent

#### Description

yle="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: 'Helvetica Neue';">Applying machine learning on large data became part of daily life. Scalability of distributed machine learning algorithms requires tolerance of slower computing nodes referred to as stragglers. To mitigate the effect of stragglers, one can add redundancy to the distributed data. However, stochastic gradient descent (SGD) is known to perform well even when computing on a part of the data. It is shown that for convex loss functions, a small redundancy or no redundancy at all is enough to guarantee good performance of SGD.

yle="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: 'Helvetica Neue';">

yle="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: 'Helvetica Neue';">In this thesis, we examine the possibility of coding (adding redundancy) over the coordinates rather than coding over the whole data. The goal is to do unequal protection of the coordinates, i.e., give higher protection to the important coordinates that affect the convergence of the algorithm. The goal is to investigate stochastic gradient coordinate descent and check the effect of the redundancy on the convergence. In addition we want to investigate the possibility of ranking the coordinates by importance in order to decide on their unequal protection.

#### Supervisor:

Sum-Rank-Metric Codes for Distributed Data Storage

## Sum-Rank-Metric Codes for Distributed Data Storage

**Keywords: **

Sum-Rank-Metric Codes, Locally Repairable Codes, Distributed Data Storage

#### Description

The sum-rank is a family of metrics which contains both Hamming and rank metric as special cases and in general can be

seen as a mix of the two. It was introduced as a suitable distance measure for multi-shot network coding in 2010. Recently, the codes have found an application in coding for distributed data storage [1,2].

The student should understand the following definitions:

- sum-rank metric [1,3]

- linearized Reed-Solomon (LRS) codes [1,3]

- maximally-recoverable locally repairable code (MR-LRC) [1,2], also known as partial MDS (PMDS) codes in other literature

Furthermore, the student should understand the constructions of MR-LRCs based on LRS codes in the two papers [1, Sections I-III] and [2], and compare their field sizes.

References:

[1] Umberto Martínez-Peñas and Frank R. Kschischang. "Universal and dynamic locally repairable codes with maximal recoverability via sum-rank codes." IEEE Transactions on Information Theory 65.12 (2019): 7790-7805.

[2] Han Cai et al. "A Construction of Maximally Recoverable Codes with Order-Optimal Field Size." arXiv preprint arXiv:2011.13606 (2020).

[3] Umberto Martínez-Peñas. "Skew and linearized Reed–Solomon codes and maximum sum rank distance codes over any division ring." Journal of Algebra 504 (2018): 587-612.

#### Prerequisites

- good knowledge in coding theory and linear algebra

- interest in coding for distributed data storage

#### Supervisor:

Private, Secure and Flexible Distributed Machine Learning on the Cloud

## Private, Secure and Flexible Distributed Machine Learning on the Cloud

**Keywords: **

Distributed machine learning, straggler mitigation, information theoretic privacy and security

**Short Description: **

This project aims at finding coding techniques for implementing machine learning algorithms on untrusted devices on the cloud. A key challenge is to adapt to cloud devices, e.g., IoT, is that have different compute power that change with time.

#### Description

In the era of big data, running learning algorithms on single computing is becoming a bottleneck. Therefore the need of distributing the computation is inevitable. However, the deployment of distributed computing introduces new challenges that, if ignored, may outweigh the benefit of parallelism.

We consider the master-worker topology in which the master needs to run a computation on its data. The master breaks the computation into smaller tasks distributed to processing nodes referred to as workers. The workers run the tasks in parallel. The master combines the results sent back from the workers to obtain its original computation.

However, heterogeneity of the workers' computation power and/or network link properties can slow down the process for some workers. Waiting for the slowest worker is shown to outweigh the effect of parallelism. Moreover, the privacy of the data is concerned since it is shared with external workers. The leakage of the master data can be harmful or even illegal. The master also risks employing an adversarial computing node as a worker, whose goal is to corrupt the whole computation.

Recently, coding theoretic techniques have been used to speed up the computation, guarantee the privacy and security of the distributed computing paradigm under different settings. The setting of interest for this project is one where the workers have different time-varying compute powers. Thus a flexible coding technique is needed. We focus on matrix-matrix multiplication as a building block in several machine learning algorithms.

In this project, we would like to design codes that allow us to overcome all the three challenges for matrix-matrix multiplication (or more types of computation if the time allows). We start by building on the work in https://arxiv.org/abs/2004.12925. The main focus of the project is on the security of the scheme, i.e., robustness against malicious workers trying to corrupt the computation. We aim to implement the designed codes on Google cloud platform, or Amazon Web Services to test their practicality.

#### Prerequisites

Coding Theory

Information Theory

Linear Algebra

Probability Theory

Programming skills

Self-motivation

#### Contact

Marvin Xhemrishi: marvin.xhemrishi@tum.de

Rawad Bitar: rawad.bitar@tum.de

#### Supervisor:

Adaptive compute strategies for distributed learning

## Adaptive compute strategies for distributed learning

**Keywords: **

Distributed machine learning, straggler mitigation, adaptive worker selection, adaptive data allocation

**Short Description: **

We consider the master/server computing model. The goal of this project is to design coding techniques that adaptively assigns computational tasks to the workers. In addition, the technique allows the master node to adaptively choose the number of workers it waits for at every iteration. Both assignment strategy and waiting strategy aim to minimize the time spent on running the machine learning algorithm.

#### Description

We consider the master/server computing model. A master possesses a large amount of data and wants to run a machine learning algorithm on this data. The master offloads the data to several worker machines to parallelize the computation and reduce the time it takes. If the master waits for all the workers, he is limited by the slowest worker. This is known as the straggler problem.

To mitigate the straggler problem, the master can distribute the data with redundancy to the workers. However, adding redundancy to the workers increases the computing time. On the other hand, stochastic gradient descent is known to converge even if the master does not wait for all the workers per iteration. Therefore, a tradeoff between redundancy, and convergence speed arises.

It is shown that for convex loss functions, the master can assign data without redundancy to the workers and chooses the number of workers to wait for per iteration to minimize the waiting time.

The goal of this project is to have the best of both worlds. More precisely, we want to design coding techniques that potentially change the redundancy of the data in an adaptive manner. In addition, the technique allows the master to adaptively choose the number of workers it waits for at every iteration. Both assignment strategy and waiting strategy aim to minimize the time spent on running the machine learning algorithm.

The coding technique we aim to introduce is tailored to the problem at hand. If communication between the master and workers is fast, e.g., serverless compute service, then changing the redundancy is affordable. Otherwise, in settings such as internet of things and edge computing, the master may want to fix the redundancy a priori to a designed value and choose to adaptively wait for a different number of workers per iterations.

It is expected to program a distributed stochastic gradient descent on Google cloud platform, or Amazon Web Services to validate and/or design our system's parameters.

#### Prerequisites

Knowledge of coding theory, linear algebra and machine learning

Good programming skills

Self-motivated and eager to learn

Bonus: knowledge of probability theory and convex optimization

#### Contact

rawad.bitar@tum.de

https://sites.google.com/site/rawadbitar1