- Session
- 13:25
- Duration: 27 mins
- Publication date: 11 Nov 2025
- Location: Turing Lecture Theatre, IET London: Savoy Place, London, United Kingdom
- Part of event REACH 2025
About the session
Partha Maji, Senior Director – AI Hardware Acceleration, Microsoft, UK
As Large Language Models (LLMs) scale to trillions of parameters, traditional floating-point formats are increasingly constrained by memory bandwidth, energy, and storage limits. Emerging block floating-point (BFP) schemes offer a promising alternative—combining fixed-point efficiency with floating-point adaptability through shared exponents and local scaling. This talk explores the design space of BFP formats for both inference and training, focusing on how exponent sharing, mantissa precision, and block granularity interact with accuracy, stability, and hardware cost. Drawing from recent advances such as MX and NVFP variants, we will examine practical design choices - calibration, accumulation, rounding, and mixed-precision fusion - that enable 3–6× compression and improved accelerator utilization without significant accuracy loss. The discussion bridges algorithm and hardware perspectives, outlining co-design principles that make BFP numerics deployable in real systems. Finally, we highlight open research challenges in dynamic range handling, attention sensitivity, and unified training-inference numerics - inviting the community to rethink precision as a continuum, not a constant.