Implementation of a Demonstrator for Coded Computing
In future 6G mobile communication networks, machine learning and other complex tasks will be executed on distributed computing clusters. In order to reduce the latency, coding schemes shall be applied. Thereby, redundant computation tasks are scheduled in order to alleviate the impact of slow worker nodes.
In the context of our related research, we are planning to implement a demonstrator, which shall present a potential application of coded computation schemes and shall point out their benefits. For example, we could use coded computations to run a machine learning algorithm in a distributed manner on different worker machines.
The objective of this project is to get familiar with different coded computing schemes proposed in the literature and implement those for a small wireless distributed computing network. In the first step, a simple Python-based distributed computation cluster shall be set up using the RAY framework and Tensorflow (or PyTorch). Finally, a coding scheme shall be implemented for the distributed computation cluster.
Knowledge in channel coding
Good programming skills in Python
Experience with Java and/or Android would be a plus
Basic knowledge of machine learning with Tensorflow
Short Description: Research on the trade-off between computation latency and communication latency in distributed computing systems.
The latency of a distributed computing algorithm mainly depends on the computation latency. However, the communication latency can also have a remarkable impact on the overall latency, e.g. when dealing with wireless links. By the application of coded computing schemes, the computation latency can be traded off against the communication latency, and vice-versa. This trade-off shall be analysed.
S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,” IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 109–128, Jan 2018.
S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A unified coding framework for distributed computing with straggling servers,” in IEEE Globecom Workshops (GC Workshop), Dec 2016, pp. 1–6.
J. Zhang and O. Simeone, “Improved latency-communication trade-off for map-shuffle-reduce systems with stragglers,” arXiv preprint arXiv:1808.06583, 2018.