CE25 - Réseaux de communication multi-usages, infrastructures de hautes performances, sciences et technologies logicielles 2020

F3CAS: Rethinking FPGA-Accelerated Computer Architecture Simulation for Data Storage Exploration – F3CAS

F3CAS: Bridging Usability and Performance in FPGA-Accelerated Architectural Simulation

F3CAS proposes a novel paradigm combining FPGA acceleration with software-level abstraction to enable fast, accurate, and user-friendly architectural simulation. The project demonstrates that complex microarchitectural mechanisms—such as cache replacement policies—can be modularized and explored efficiently without requiring invasive hardware modifications.

Towards Usable and High-Performance FPGA-Accelerated Architectural Simulation

Modern computing systems face a critical bottleneck: more than half of system energy and performance is now dominated by data movement and memory hierarchy behavior. As technology scaling slows, architectural innovation—particularly in memory systems—has become the primary driver of performance improvements . However, evaluating new architectural ideas remains challenging. Software simulators are flexible but prohibitively slow, while FPGA-accelerated platforms offer speed but require significant hardware expertise and complex RTL modifications. This creates a fundamental trade-off between usability and performance. The objective of the F3CAS project is to bridge this gap by enabling: - Fast and cycle-accurate simulation through FPGA acceleration - High usability through software-level abstraction - Modular integration of microarchitectural components The project focuses on memory hierarchy exploration, using cache replacement policies as a representative and demanding use case. The overarching goal is to democratize access to high-performance simulation tools and accelerate innovation in computer architecture.

The F3CAS approach combines FPGA-accelerated simulation with a modular abstraction layer that decouples microarchitectural mechanisms from underlying hardware implementations.

 

The core contribution is the introduction of a Replacement-Policy Unit (RPU), a modular interface that encapsulates cache replacement logic and allows seamless integration into an FPGA-accelerated simulation framework (FireSim/Chipyard) .

 

Key methodological innovations include:

- Latency-insensitive integration: decoupling policy execution from cache timing to preserve simulation correctness

- Hardware–software co-design: enabling implementation of policies in multiple forms (software, RTL, HLS, soft-core CPU)

- Modular abstraction: isolating microarchitectural components to avoid invasive RTL changes

- Full-system FPGA simulation: leveraging FireSim to evaluate realistic workloads with cycle accuracy

 

The framework supports multiple cache replacement policies (Random, LRU, SHiP, Hawkeye), enabling systematic exploration of performance, resource usage, and design trade-offs.

The F3CAS project successfully demonstrates that modular microarchitectural exploration is feasible within FPGA-accelerated simulation environments.

 

Key results include:

- Design and implementation of the RPU abstraction, enabling plug-and-play integration of cache replacement policies without modifying the cache controller

- Multiple implementation strategies (software, RTL, HLS, soft-core), highlighting trade-offs between flexibility, performance, and FPGA resource utilization

- Comprehensive experimental evaluation across different processor configurations (Rocket, BOOM) and workloads

- Quantitative analysis of simulation overhead, showing that modularity can be achieved with limited performance impact

- Evaluation of advanced policies (SHiP, Hawkeye) demonstrating improved cache efficiency and performance

 

The results confirm that architectural innovation can be significantly accelerated by combining modular design with FPGA-based simulation, without sacrificing accuracy.

The F3CAS project opens several promising research and development directions.

 

First, the proposed modular simulation paradigm could be extended beyond cache replacement policies to other microarchitectural components, including:

- LLC prefetching mechanisms

- DRAM memory controllers

 

Second, the approach can be scaled to more complex multicore architectures.

 

Finally, we have only begun to explore the potential of soft-core–enabled F3CAS. We believe there is significant opportunity to further reduce the current acceleration overhead by specializing the soft-core architecture through domain-specific instruction set extensions.

Modern high-performance mobile computing architectures spend more than 60% of the energy on data storage and movement. However, fundamental limitations in existing evaluation tools hinder the road to innovation in memory systems. Accordingly, this project proposes a new paradigm to build user-Friendly, Fast and Faithful Computer-system Architecture Simulations (F3CAS) tailored to the exploration of new memory architectures. F3CAS uniquely combines FPGA-acceleration with tightly coupled domain-specific soft-processors to encapsulate the simulator in a software-like abstraction. The F3CAS simulation will be demonstrated in the evaluation of emerging Non-Volatile Memory (NVM) technologies, such as the Spin-Transfer Torque (STT) RAM.

Project coordination

David Novo (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

LIRMM Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier

Help of the ANR 230,170 euros
Beginning and duration of the scientific project: September 2021 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter