Keynote Speech by Prof. Herkersdorf at NorCAS 2019 Conference in Helsinki

Prof. Herkerdorf gave a keynote speech at NorCAS 2019, which took place in Helsinki, Finland from October 29 to 30, 2019 (website of NoCAS 2019). He presented major results of our work done within project B5 of the Transregional Collaborative Research Center "Invasive Computing", which is funded by the German Researc Foundation, DFG.

The title of his talk was "Tackling the MPSoC Data Locality Challenge with Regional Coherence and Near Memory Acceleration" and it covered the following contents:

Data access latencies and bandwidth bottlenecks frequently represent major limiting factors for the computational effectiveness of multi- and many-core processor architectures. This keynote talk introduces two conceptually complementary approaches to reduce the synchronization overheads for coherence maintenance and to improve the locality between computing resources and data: Region-based cache coherence and near memory acceleration.

A 2D array of compute tiles with multiple, heterogeneous RISC cores, two levels of caches and a tile-local SRAM memory serves as reference processing platform. Compute tiles, I/O tiles and globally shared DDR SDRAM memory tiles are interconnected by a meshed Network on Chip (NoC) with support for multiple quality of service levels. Overall, this processing architecture follows a distributed-shared-memory model. The limited degree of parallelism in many embedded computing applications also bounds the number of compute tiles possibly sharing associated data structures. Therefore, we envision region-based cache coherence (RBCC) among a limited working set of compute tiles over global coherence approaches. Coherence regions can dynamically be reconfigured at runtime and comprise a number of arbitrary (adjacent or non-adjacent) compute tiles which are interconnected through regular NoC channels for the exchange of coherency protocol messages. We will show that region-based coherence allows maintaining substantially smaller coherence directories (e.g., by approx. 40% reduced in size for 16 tiles systems with up to 4 tiles per region) and shorter sharer checking latencies than global coherence.

Near memory processing is an alternative concept to increase data/task locality by means of near memory accelerators (NMA). NMA positions processing resources for specific forms of data manipulations as close as possible to the data memory. The evident benefits are: reducing global interconnect usage, shortening of access latencies and, thus, increasing compute efficiency. In distributed-shared-memory architectures, where accelerator units can be affiliated with different tile-local SRAMs as well as with the globally shared DDR SDRAM, near memory acceleration requires thorough consideration of task mapping as well as task and data migration into and among compute tiles.