- Session
- 10:11 - 10:11
- Duration: 35 mins
- Publication date: 19 Nov 2024
- Location: Conference, Chicago Business School, London, United Kingdom
- Part of event REACH 2024
About the session
NorthPole, developed at IBM Almaden Research Center, is a neuromorphic computing architecture designed for highly efficient AI and machine learning inference. Inspired by the human brain, NorthPole delivers high throughput and low latency, making it ideal for edge AI and data center applications.
Key features include:
• Efficient Parallel Processing: NorthPole’s distributed cores minimize data movement and maximize local processing, aligning well with the parallel demands of AI inference while conserving power.
• Optimized Memory: Local memory storage reduces data transfer times, essential for handling large model weights in small-batch-size inference.
• Energy Efficiency & Scalability: With low power requirements, NorthPole can scale effectively for large language models in both edge and data center environments.
• Low-Latency Inference: By minimizing computational load per core and enhancing inter-core communication, NorthPole achieves fast, real-time responses—boosted by 13TB/s on-chip memory bandwidth.
• Mixed-Precision Operations: Hardware support for quantized matrix multiplications drives unprecedented efficiency without sacrificing accuracy.
Overall, NorthPole offers a powerful solution for accelerated AI inference, addressing challenges in memory, latency, and power consumption—perfect for scenarios requiring high performance and low energy use while delivery low-latency inference.
We will walk through NorthPole architectural highlights, benchmarking results on edge applications as well as Large Language Models. We will also show demos from these applications, and insights into developing a software ecosystem.