A major concern of current and future on-chip systems is the thermal problem i.e. electrical energy is dissipated leading to high chip temperatures. Short term effects may include transient malfunction or even irreversible damage whereas long-term effects may lead to deteriorating functionality (e.g. increased signal travel times) or to irreversible damage due to, for example, electro-migration. The problem worsens with the inception of 3D architectures as the per-surface dissipated thermal energy increases.
It is the goal of this project to address dependability problems in 3D stacked many-core architectures resulting from thermal effects. We tackle the problem by means of a combined system and architecture level approach. A hierarchical agent-based thermal management system initiates proactive task migration onto cooler processing resources while a communication virtualization layer dynamically adapts and protects connectivity between (migrated) tasks and external I/Os.
Various characteristic features are necessary for the feasibility of the proposed approach:
- Scalability is necessary in order to be applicable for future many-core systems with hundreds or even thousands of cores;
- Real-time capabilities are needed for meeting embedded application requirements;
- A virtual communication layer with service guarantees is required for abstracting the underlying physical on-chip interconnect structure (i.e., NoC);
- Run-time adaptability is required to adapt the management and communication subsystems according to the characteristics of the thermal events;
- Architecture agnosticism allows the concepts to be deployed on a number of architectures; and
- The techniques must be inherently robust.
Key element(s) are HW enablements in the I/O tile and compute tile via VNIC (Virtual Network Interconnect) and VNA (Virtual Network Adapter), respectively. They provide dependable communication virtualization services independent of the capabilities of the underlying physical interconnect.
The VNIC is responsible for offloading compute intensive tasks for I/O virtualization from the compute tiles (parsing, processing, scheduling). This is achieved via a streamlined architecture with configurable FSMs at its heart. This allows adapting to different szenarios and loads with guaranteeing reserved ressources for hard real-time and shared resources for best effort and supporting different QoS classes.
Furthermore multi- and broadcasting of incoming messages to different compute tiles, i.e. VNAs should be supported.
VNA are planed to be stripped down versions of VNICs.
The communication channels are pre set and configured via management in SW. In case of a task relocation (due to request by the thermal management to prevent hotspots) the routes and configuration and updated and switched on after tasks are migrated to the new location.
I/O communication virtualization part of the project is researched by LIS, the hierarchical agent-based thermal management system is done by CES, Karlsruhe.
This project is supported by the German Research Foundation (DFG) as part of the national focal program "Dependable Embedded Systems" (SPP-1500).