IBM unveils Telum II architecture and Spyre Accelerator at Hot Chips 2024.
IBM has unveiled the architecture details for its upcoming IBM Telum II Processor and IBM Spyre Accelerator at the Hot Chips 2024 conference. These new technologies are designed to significantly scale processing capacity across next-generation IBM Z mainframe systems, enabling the acceleration of both traditional AI models and large language AI models through a new ensemble method of AI. As many generative AI projects leveraging Large Language Models (LLMs) transition from proof-of-concept to production, enterprises’ demand for power-efficient, secure, and scalable solutions has become a top priority.
According to Morgan Stanley research, the power demands for generative AI are expected to increase by 75% annually over the next several years, with projections indicating that AI’s energy consumption could match Spain’s by 2026. This has driven IBM clients to prioritize architectural decisions that support appropriately sized foundation models and hybrid-by-design approaches for AI workloads.
The IBM Telum II Processor is designed to power the next generation of IBM Z systems. It features increased frequency, expanded memory capacity, a 40% growth in cache, and an integrated AI accelerator core. The new processor introduces a coherently attached Data Processing Unit (DPU) engineered to accelerate complex IO protocols for networking and storage on the mainframe. The DPU simplifies system operations and enhances the performance of principal components, making the Telum II processor well-suited for enterprise compute solutions supporting LLMs and the industry’s complex transaction needs.
Complementing the Telum II Processor is IBM Spyre Accelerator, providing additional AI compute capabilities. Together, the Telum II and Spyre chips form a scalable architecture that supports ensemble methods of AI modeling—combining multiple machine learning or deep learning AI models with encoder LLMs. This ensemble approach leverages the strengths of each model architecture to deliver more accurate and robust results compared to individual models. The IBM Spyre Accelerator, introduced as a preview at Hot Chips 2024, will be available as an add-on option. It is attached via a 75-watt PCIe adapter and is scalable to fit client needs.
Tina Tarquinio, VP of Product Management for IBM Z and LinuxONE, emphasized IBM’s commitment to staying ahead of technology trends, particularly the escalating demands of AI. She stated that the Telum II Processor and Spyre Accelerator are designed to deliver high-performance, secure, and power-efficient enterprise computing solutions. These innovations, which have been years in development, will be introduced in IBM’s next-generation IBM Z platform, enabling clients to leverage LLMs and generative AI at scale.
The Telum II processor and IBM Spyre Accelerator will be manufactured by IBM’s long-standing partner, Samsung Foundry, using its high-performance, power-efficient 5nm process node. Together, these technologies will support a range of advanced AI-driven use cases designed to unlock business value and create new competitive advantages. For example, enhanced fraud detection in home insurance claims can be achieved through ensemble AI models that combine LLMs with traditional neural networks. Additionally, advanced detection of suspicious financial activities can help support compliance with regulatory requirements and mitigate the risk of economic crimes. At the same time, AI assistants can accelerate application lifecycles, transfer knowledge, and provide code explanations and transformations.
The Telum II processor is set to feature eight high-performance cores running at 5.5GHz, with 36MB of L2 cache per core and a 40% increase in on-chip cache capacity, totaling 360MB. The virtual level-4 cache will offer 2.88GB per processor drawer, a 40% increase over the previous generation. The integrated AI accelerator enables low-latency, high-throughput in-transaction AI inferencing, delivering a fourfold increase in compute capacity per chip compared to the last generation. Additionally, the new I/O Acceleration Unit DPU, integrated into the Telum II chip, is designed to improve data handling with a 50% increase in I/O density, enhancing IBM Z’s overall efficiency and scalability for large-scale AI workloads and data-intensive applications.
The IBM Spyre Accelerator is a purpose-built enterprise-grade accelerator designed to handle complex AI models and generative AI use cases. It features up to 1TB of memory, distributed across eight cards in a regular IO drawer, supporting AI model workloads across the mainframe while consuming no more than 75W per card. Each chip has 32 compute cores supporting int4, int8, fp8, and fp16 datatypes, enabling both low-latency and high-throughput AI applications.
The Telum II processor will power IBM’s next-generation IBM Z and IBM LinuxONE platforms and will be available in 2025. The IBM Spyre Accelerator, currently in tech preview, is expected to be available in 2025.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed