Multicore Hardware-Software Design and Verification Techniques

On the Design of Multicore Architectures Guided by a Miss Table at Level-1 and Level-2 Caches to Improve Predictability and Performance/Power Ratio

Author(s): Abu Asaduzzaman and Fadi N. Sibai

Pp: 19-32 (14)

DOI: 10.2174/978160805225711101010019

* (Excluding Mailing and Handling)

Abstract

Most contemporary architectures for high-performance low-power computing systems consist of multicore processors, where tasks are distributed among multiple cores to improve processing speed and the system runs at a lower frequency to reduce the total power consumption. However, multilevel caches in multicore architectures multiply the timing unpredictability and require significant amount of power to be operated. Cache locking techniques are used in single-core systems to improve predictability by locking useful blocks in the cache. The success of cache locking primarily depends on the effective selection of the right blocks to be locked. In prior work, we introduced an efficient block selection methodology and a Miss Table based cache locking scheme where information about the blocks and cache misses are stored in the Miss Table to facilitate the cache locking. Cache locking in multicore is more challenging because of the complexity introduced by the architecture. In this chapter, we investigate the impact of the particular placement of the Miss Table, i.e. whether at the level-1 cache (CL1) or at level-2 cache (CL2), on the system’s predictability and performance/power ratio. Using VisualSim and Heptane simulation tools, we simulate an 8-core architecture, where each core’s private CL1 is split into instruction (I1) and data (D1) caches and the CL2 is unified and shared by the cores. Experimental results using MPEG4 decoding and FFT algorithms show that Miss Table based cache locking at level-1 is more beneficiary than Miss Table based cache locking at level-2 for MPEG4; a maximum reduction of 38% in mean delay per task and a maximum reduction of 32% in total power consumption are achieved by locking one-fourth of the I1 cache size. For FFT, the impact of locking at level-1 and level-2 is almost the same.

Related Journals
Related Books
© 2024 Bentham Science Publishers | Privacy Policy