Skip to main navigation Skip to search Skip to main content

Analyzing the memory ordering models of the Apple M1

Lars Wrenger*, Dominik Töllner, Daniel Lohmann

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer review

Abstract

The Apple M1 ARM processor family incorporates two memory consistency models: the conventional ARM weak memory ordering and the Total store ordering (TSO) model from the x86 architecture utilized by Apple's x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads. In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on the multi-threading workloads of the SPEC2017 CPU FP benchmark suite, our findings indicate that TSO is, on average, 8.94 percent slower than ARM's weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO. We also take a deeper look into the specific atomic instructions provided by the ARMv8.3 specification and their synchronization overheads.

Original languageEnglish
Article number103102
Number of pages8
JournalJournal of Systems Architecture
Volume149
E-pub ahead of print4 Mar 2024
DOIs
Publication statusPublished - Apr 2024

Keywords

  • Apple M1
  • ARM
  • Memory ordering
  • TSO

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Cite this