Impact of Chip-Level Integration on Performance of OLTP Workloads - Robotics Institute Carnegie Mellon University

Impact of Chip-Level Integration on Performance of OLTP Workloads

L. A. Barroso, K. Gharachorloo, Andreas Nowatzyk, and B. Verghese
Conference Paper, Proceedings of 6th International Symposium on High-Performance Computer Architecture (HPCA '00), pp. 3 - 14, 2000

Abstract

With increasing chip densities, future microprocessor designs have the opportunity to integrate many of the traditional system-level modules onto the same chip as the processor. Some current designs already integrate extremely large on-chip caches, and there are aggressive next-generation designs that attempt to also integrate the memory controller, coherence hardware, and network router all onto a single chip. The tight coupling of these modules will enable efficient memory systems with substantially better latency and bandwidth characteristics relative to current designs. Among the important application areas for high-performance servers, online transaction processing (OLTP) workloads are likely to benefit most from these trends due to their large instruction and data footprints and high communication miss rates. This paper examines the design trade-offs that arise as more system functionality is integrated onto the processor chip, and identifies a number of important architectural choices that are influenced by chip-level integration. In addition, the paper presents a detailed study of the performance impact of chip-level integration in the context of OLTP workloads. Our results are based on full system simulations of the Oracle commercial database engine running on both in-order and out-of-order issue processors used in uniprocessor and multiprocessor configurations. The results show that chip-level integration can improve the performance of both configurations by about 1.4 to 1.5 times, though for different reasons. For uniprocessors, integration of the L2 cache and the resulting lower hit latency is the primary factor in performance improvement. For multiprocessors, the improvement comes from both the integration of the L2 cache (lower L2 hit latency) and the integration of the other memory system components (better dirty remote latency). Furthermore, we find that the higher associativity afforded by integrating the L2 cache plays a critical role in counteracting the loss of capacity relative to larger off-chip caches. Finally, we find that the relative gains from chip-level integration are virtually identical for in-order and out-of-order processors.

BibTeX

@conference{Barroso-2000-7972,
author = {L. A. Barroso and K. Gharachorloo and Andreas Nowatzyk and B. Verghese},
title = {Impact of Chip-Level Integration on Performance of OLTP Workloads},
booktitle = {Proceedings of 6th International Symposium on High-Performance Computer Architecture (HPCA '00)},
year = {2000},
month = {January},
pages = {3 - 14},
}