yle="font-weight: normal;" lang="de-DE" align="left">In this project we study two dimensional deletion/insertion problem introduced in https://arxiv.org/pdf/2004.14740.pdf. An array of n rows and n columns is transmitted through a specific deletion channel which for example occurs in DNA storage systems and racetrack memories. After transmission the receiver observes an array of dimension (n-1)*(n-1). The goal is to allow the receiver to know exactly which n*n array was transmitted by using error-correction techniques.
The goal of the project is to derive bounds on the deletion ball of any two-dimensional array. A deletion ball of an array X is the set of unique arrays resulting from the deletion of any possible combination of a column and a row in X. Deriving such bounds is not trivial since it depends heavily on the structure of the array. Thus, it is important to investigate useful characteristics representing the structure of an array for this particular setting. Furthermore, having this bound will be helpful to characterize the optimal redundancy of a deletion-correcting code for this setting.
Criss-Cross Insertion and Deletion Correcting Codes
In this project the coding problem for deletion/insertion of rows and columns in arrays is studied. Especially it is of interest to construct a code leveraging the 2-dimensional array structure which can correct any combination of row and colum deletion/insertions.
Exploring while Exploiting Workers for Fast Distributed Machine Learning
Keywords: Distributed machine learning, straggler mitigation, multi-armed bandits, coding theory
We consider the problem of running a distributed machine learning algorithm with the help of workers that have different performance/speed. To fully leverage the performance of the workers, we ideally want to assign a computation load proportional to the speed of each worker. However, the performance of the workers is not known a priori. To that end, we investigate the use of multi-arm bandits theory that allows learning the performance of the workers (exploring them) while assigining useful computations (exploiting them).
Coding for Distributed Coordinate Gradient Descent
yle="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: 'Helvetica Neue';">Applying machine learning on large data became part of daily life. Scalability of distributed machine learning algorithms requires tolerance of slower computing nodes referred to as stragglers. To mitigate the effect of stragglers, one can add redundancy to the distributed data. However, stochastic gradient descent (SGD) is known to perform well even when computing on a part of the data. It is shown that for convex loss functions, a small redundancy or no redundancy at all is enough to guarantee good performance of SGD.
yle="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 12px; line-height: normal; font-family: 'Helvetica Neue';">In this thesis, we examine the possibility of coding (adding redundancy) over the coordinates rather than coding over the whole data. The goal is to do unequal protection of the coordinates, i.e., give higher protection to the important coordinates that affect the convergence of the algorithm. The goal is to investigate stochastic gradient coordinate descent and check the effect of the redundancy on the convergence. In addition we want to investigate the possibility of ranking the coordinates by importance in order to decide on their unequal protection.