ADVANCED ALGORITHM AND DEVICE CONCEPTS FOR NEAR- AND IN-MEMORY COMPUTING
The rapid growth of artificial intelligence (AI) and the increasing complexity of deep neural networks (DNNs) are exposing critical limitations in conventional computing systems. Modern systems based on the von Neumann architecture, where computations and memory units are physically separated, struggle to meet the demands of AI workloads, particularly due to bottlenecks in data transfer between memory and processors. These inefficiencies are underlined by the increasing volume of data and communication loads characteristic to cutting-edge AI applications. To address these challenges, in-memory computing (IMC) and near-memory computing (NMC) have emerged as promising approaches. IMC eliminates data movement by performing computations directly within memory arrays, while NMC minimizes latency by moving computations closer to memory. Both approaches leverage emerging memory technologies, especially nonvolatile ones like phase-change memory (PCM) or resistive RAM (RRAM) to redefine how AI workloads are executed, promising significant improvements in performance and energy efficiency. ; ; The adoption of these paradigms introduces new challenges, as transitioning from traditional computing architectures to IMC/NMC systems requires overcoming a fragmented design landscape. Current toolchains lack comprehensive support for hardware-software co-design, particularly in enabling hardware-aware AI model execution where both the hardware architecture adapts to the AI models requirements and the model is optimised for the target hardware. This gap extends across the entire DNNs stack from algorithmic design to compiler-level mapping and hardware deployment. Existing frameworks often treat IMC and NMC as isolated solutions rather than complementary components of a hybrid computing ecosystem. This gap underscores the need to explore trade-offs between power consumption, latency, area, and computational precision which are critical for deploying AI models in resource-constrained applications. ; ; This work proposes a unified framework to simulate, compile, and deploy IMC/NMC architectures using a cross-stack approach tailored for AI acceleration. The framework leverages hardware-aware DNNs optimisation to adapt AI models for the unique advantages of IMC and such as analog in-memory matrix operations and near-memory data preprocessing while co-optimizing architectures for device-specific constraints like computational precision and noise tolerance or tasks-specific constraints like model accuracy. Building on this foundation, the framework incorporates compiler-driven mapping to automate the partitioning and deployment of DNNs workloads across heterogeneous computing units (e.g., CPUs, IMC cores, NMC accelerators), using topology-aware scheduling to minimize data movement and latency. To enable early-stage design exploration, the framework integrates a hybrid simulation environment combining custom-built simulators (for analog IMC behaviour) with external tools like Gem5 for system-level modelling and SPICE for circuit validation, allowing rigorous evaluation of latency, energy, and area trade-offs. ; Further strengthening the framework, it extends into security and reliability analysis to address vulnerabilities inherent to IMC/NMC architectures, such as side-channel attacks in shared near-memory buffers or thermal throttling risks in densely packed compute-in-memory arrays. Mitigation strategies like encryption for analog data paths and integration of physical unclonable functions (PUFs) will be embedded directly into the co-design workflow, ensuring robustness against adversarial and environmental threats. Finally, the framework supports scalable deployment through hardware-in-the-loop simulation, bridging the gap between optimised models and real-world AI accelerators by enabling rapid prototyping of hybrid IMC/NMC systems. ;
Back to Current Students