logo
Oracle unveils AMD-powered zettascale AI cluster for OCI cloud

Oracle unveils AMD-powered zettascale AI cluster for OCI cloud

Techday NZ2 days ago

Oracle has announced it will be one of the first hyperscale cloud providers to offer artificial intelligence (AI) supercomputing powered by AMD's Instinct MI355X GPUs on Oracle Cloud Infrastructure (OCI).
The forthcoming zettascale AI cluster is designed to scale up to 131,072 MI355X GPUs, specifically architected to support high-performance, production-grade AI training, inference, and new agentic workloads. The cluster is expected to offer over double the price-performance compared to the previous generation of hardware.
Expanded AI capabilities
The new announcement highlights several key hardware and performance enhancements. The MI355X-powered cluster provides 2.8 times higher throughput for AI workloads. Each GPU features 288 GB of high-bandwidth memory (HBM3) and eight terabytes per second (TB/s) of memory bandwidth, allowing for the execution of larger models entirely in memory and boosting both inference and training speeds.
The GPUs also support the FP4 compute standard, a four-bit floating point format that enables more efficient and high-speed inference for large language and generative AI models. The cluster's infrastructure includes dense, liquid-cooled racks, each housing 64 GPUs and consuming up to 125 kilowatts per rack to maximise performance density for demanding AI workloads. This marks the first deployment of AMD's Pollara AI NICs to enhance RDMA networking, offering next-generation high-performance and low-latency connectivity.
Mahesh Thiagarajan, Executive Vice President, Oracle Cloud Infrastructure, said: "To support customers that are running the most demanding AI workloads in the cloud, we are dedicated to providing the broadest AI infrastructure offerings. AMD Instinct GPUs, paired with OCI's performance, advanced networking, flexibility, security, and scale, will help our customers meet their inference and training needs for AI workloads and new agentic applications."
The zettascale OCI Supercluster with AMD Instinct MI355X GPUs delivers a high-throughput, ultra-low latency RDMA cluster network architecture for up to 131,072 MI355X GPUs. AMD claims the MI355X provides almost three times the compute power and a 50 percent increase in high-bandwidth memory over its predecessor.
Performance and flexibility
Forrest Norrod, Executive Vice President and General Manager, Data Center Solutions Business Group, AMD, commented on the partnership, stating: "AMD and Oracle have a shared history of providing customers with open solutions to accommodate high performance, efficiency, and greater system design flexibility. The latest generation of AMD Instinct GPUs and Pollara NICs on OCI will help support new use cases in inference, fine-tuning, and training, offering more choice to customers as AI adoption grows."
The Oracle platform aims to support customers running the largest language models and diverse AI workloads. OCI users leveraging the MI355X-powered shapes can expect significant performance increases—up to 2.8 times greater throughput—resulting in faster results, lower latency, and the capability to run larger models.
AMD's Instinct MI355X provides customers with substantial memory and bandwidth enhancements, which are designed to enable both fast training and efficient inference for demanding AI applications. The new support for the FP4 format allows for cost-effective deployment of modern AI models, enhancing speed and reducing hardware requirements.
The dense, liquid-cooled infrastructure supports 64 GPUs per rack, each operating at up to 1,400 watts, and is engineered to optimise training times and throughput while reducing latency. A powerful head node, equipped with an AMD Turin high-frequency CPU and up to 3 TB of system memory, is included to help users maximise GPU performance via efficient job orchestration and data processing.
Open-source and network advances
AMD emphasises broad compatibility and customer flexibility through the inclusion of its open-source ROCm stack. This allows customers to use flexible architectures and reuse existing code without vendor lock-in, with ROCm encompassing popular programming models, tools, compilers, libraries, and runtimes for AI and high-performance computing development on AMD hardware.
Network infrastructure for the new supercluster will feature AMD's Pollara AI NICs that provide advanced RDMA over Converged Ethernet (RoCE) features, programmable congestion control, and support for open standards from the Ultra Ethernet Consortium to facilitate low-latency, high-performance connectivity among large numbers of GPUs.
The new Oracle-AMD collaboration is expected to provide organisations with enhanced capacity to run complex AI models, speed up inference times, and scale up production-grade AI workloads economically and efficiently.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Oracle unveils AMD-powered zettascale AI cluster for OCI cloud
Oracle unveils AMD-powered zettascale AI cluster for OCI cloud

Techday NZ

time2 days ago

  • Techday NZ

Oracle unveils AMD-powered zettascale AI cluster for OCI cloud

Oracle has announced it will be one of the first hyperscale cloud providers to offer artificial intelligence (AI) supercomputing powered by AMD's Instinct MI355X GPUs on Oracle Cloud Infrastructure (OCI). The forthcoming zettascale AI cluster is designed to scale up to 131,072 MI355X GPUs, specifically architected to support high-performance, production-grade AI training, inference, and new agentic workloads. The cluster is expected to offer over double the price-performance compared to the previous generation of hardware. Expanded AI capabilities The new announcement highlights several key hardware and performance enhancements. The MI355X-powered cluster provides 2.8 times higher throughput for AI workloads. Each GPU features 288 GB of high-bandwidth memory (HBM3) and eight terabytes per second (TB/s) of memory bandwidth, allowing for the execution of larger models entirely in memory and boosting both inference and training speeds. The GPUs also support the FP4 compute standard, a four-bit floating point format that enables more efficient and high-speed inference for large language and generative AI models. The cluster's infrastructure includes dense, liquid-cooled racks, each housing 64 GPUs and consuming up to 125 kilowatts per rack to maximise performance density for demanding AI workloads. This marks the first deployment of AMD's Pollara AI NICs to enhance RDMA networking, offering next-generation high-performance and low-latency connectivity. Mahesh Thiagarajan, Executive Vice President, Oracle Cloud Infrastructure, said: "To support customers that are running the most demanding AI workloads in the cloud, we are dedicated to providing the broadest AI infrastructure offerings. AMD Instinct GPUs, paired with OCI's performance, advanced networking, flexibility, security, and scale, will help our customers meet their inference and training needs for AI workloads and new agentic applications." The zettascale OCI Supercluster with AMD Instinct MI355X GPUs delivers a high-throughput, ultra-low latency RDMA cluster network architecture for up to 131,072 MI355X GPUs. AMD claims the MI355X provides almost three times the compute power and a 50 percent increase in high-bandwidth memory over its predecessor. Performance and flexibility Forrest Norrod, Executive Vice President and General Manager, Data Center Solutions Business Group, AMD, commented on the partnership, stating: "AMD and Oracle have a shared history of providing customers with open solutions to accommodate high performance, efficiency, and greater system design flexibility. The latest generation of AMD Instinct GPUs and Pollara NICs on OCI will help support new use cases in inference, fine-tuning, and training, offering more choice to customers as AI adoption grows." The Oracle platform aims to support customers running the largest language models and diverse AI workloads. OCI users leveraging the MI355X-powered shapes can expect significant performance increases—up to 2.8 times greater throughput—resulting in faster results, lower latency, and the capability to run larger models. AMD's Instinct MI355X provides customers with substantial memory and bandwidth enhancements, which are designed to enable both fast training and efficient inference for demanding AI applications. The new support for the FP4 format allows for cost-effective deployment of modern AI models, enhancing speed and reducing hardware requirements. The dense, liquid-cooled infrastructure supports 64 GPUs per rack, each operating at up to 1,400 watts, and is engineered to optimise training times and throughput while reducing latency. A powerful head node, equipped with an AMD Turin high-frequency CPU and up to 3 TB of system memory, is included to help users maximise GPU performance via efficient job orchestration and data processing. Open-source and network advances AMD emphasises broad compatibility and customer flexibility through the inclusion of its open-source ROCm stack. This allows customers to use flexible architectures and reuse existing code without vendor lock-in, with ROCm encompassing popular programming models, tools, compilers, libraries, and runtimes for AI and high-performance computing development on AMD hardware. Network infrastructure for the new supercluster will feature AMD's Pollara AI NICs that provide advanced RDMA over Converged Ethernet (RoCE) features, programmable congestion control, and support for open standards from the Ultra Ethernet Consortium to facilitate low-latency, high-performance connectivity among large numbers of GPUs. The new Oracle-AMD collaboration is expected to provide organisations with enhanced capacity to run complex AI models, speed up inference times, and scale up production-grade AI workloads economically and efficiently.

Oracle & NVIDIA expand OCI partnership with 160 AI tools
Oracle & NVIDIA expand OCI partnership with 160 AI tools

Techday NZ

time3 days ago

  • Techday NZ

Oracle & NVIDIA expand OCI partnership with 160 AI tools

Oracle and NVIDIA have expanded their partnership to enable customers to access more than 160 AI tools and agents while leveraging the necessary computing resources for AI development and deployment. The collaboration brings NVIDIA AI Enterprise, a cloud-native software platform, natively to the Oracle Cloud Infrastructure (OCI) Console. Oracle customers can now use this platform across OCI's distributed cloud, including public regions, Government Clouds, and sovereign cloud solutions. Platform access and capabilities By integrating NVIDIA AI Enterprise directly through the OCI Console rather than a marketplace, Oracle allows customers to utilise their existing Universal Credits, streamlining transactions and support. This approach is designed to speed up deployment and help customers meet security, regulatory, and compliance requirements in enterprise AI processes. Customers can now access over 160 AI tools focused on training and inference, including NVIDIA NIM microservices. These services aim to simplify the deployment of generative AI models and support a broad set of application-building and data management needs across various deployment scenarios. "Oracle has become the platform of choice for AI training and inferencing, and our work with NVIDIA boosts our ability to support customers running some of the world's most demanding AI workloads," said Karan Batta, Senior Vice President, Oracle Cloud Infrastructure. "Combining NVIDIA's full-stack AI computing platform with OCI's performance, security, and deployment flexibility enables us to deliver AI capabilities at scale to help advance AI efforts globally." The partnership includes making NVIDIA GB200 NVL72 systems available on the OCI Supercluster, supporting up to 131,072 NVIDIA Blackwell GPUs. The new architecture provides a liquid-cooled infrastructure that targets large-scale AI training and inference requirements. Governments and enterprises can take advantage of the so-called AI factories, using platforms like NVIDIA's GB200 NVL72 for agentic AI tasks reliant on advanced reasoning models and efficiency enhancements. Developer access to advanced resources Oracle has become one of the first major cloud providers to integrate with NVIDIA DGX Cloud Lepton, which links developers to a global marketplace of GPU compute. This integration offers developers access to OCI's high-performance GPU clusters for a range of needs, including AI training, inference, digital twin implementations, and parallel HPC applications. Ian Buck, Vice President of Hyperscale and HPC at NVIDIA, said: "Developers need the latest AI infrastructure and software to rapidly build and launch innovative solutions. With OCI and NVIDIA, they get the performance and tools to bring ideas to life, wherever their work happens." With this integration, developers are also able to select compute resources in precise regions to help achieve both strategic and sovereign AI aims and satisfy long-term and on-demand requirements. Customer projects using joint capabilities Enterprises in Europe and internationally are making use of the enhanced partnership between Oracle and NVIDIA. For example, Almawave, based in Italy, utilises OCI AI infrastructure and NVIDIA Hopper GPUs to run generative AI model training and inference for its Velvet family, which supports Italian alongside other European languages and is being deployed within Almawave's AIWave platform. "Our commitment is to accelerate innovation by building a high-performing, transparent, and fully integrated Italian foundational AI in a European context—and we are only just getting started," said Valeria Sandei, Chief Executive Officer, Almawave. "Oracle and NVIDIA are valued partners for us in this effort, given our common vision around AI and the powerful infrastructure capabilities they bring to the development and operation of Velvet." Danish health technology company Cerebriu is using OCI along with NVIDIA Hopper GPUs to build an AI-driven tool for clinical brain MRI analysis. Cerebriu's deep learning models, trained on thousands of multi-modal MRI images, aim to reduce the time required to interpret scans, potentially benefiting the clinical diagnosis of time-sensitive neurological conditions. "AI plays an increasingly critical role in how we design and differentiate our products," said Marko Bauer, Machine Learning Researcher, Cerebriu. "OCI and NVIDIA offer AI capabilities that are critical to helping us advance our product strategy, giving us the computing resources we need to discover and develop new AI use cases quickly, cost-effectively, and at scale. Finding the optimal way of training our models has been a key focus for us. While we've experimented with other cloud platforms for AI training, OCI and NVIDIA have provided us the best cloud infrastructure availability and price performance." By expanding the Oracle-NVIDIA partnership, customers are now able to choose from a wide variety of AI tools and infrastructure options within OCI, supporting both research and production environments for AI solution development.

Decision Inc. Australia Helps Accelerate GenAI MVP Delivery For OFS, Enabling Real-Time Manufacturing Intelligence
Decision Inc. Australia Helps Accelerate GenAI MVP Delivery For OFS, Enabling Real-Time Manufacturing Intelligence

Scoop

time4 days ago

  • Scoop

Decision Inc. Australia Helps Accelerate GenAI MVP Delivery For OFS, Enabling Real-Time Manufacturing Intelligence

Press Release – Decision Inc Manufacturing software provider fast-tracks generative AI capability with guidance from Decision Inc. Australia, unlocking new value and a future revenue stream. OFS (Operations Feedback Systems) is an Australian-headquartered software company that is helping manufactures in over 30 counties to produce more with less. Since 2006, its analytics platform has helped operators, supervisors, and senior leaders in manufacturing gain clear, real-time visibility into performance, enabling faster decisions and stronger outcomes on the production floor. With a customer base including Dulux, Asahi Beverages, Bega, nudie, AstraZeneca, Twinings, and Electrolux, OFS wanted to stay ahead of rising expectations around artificial intelligence by embedding generative AI directly into its product suite. To accelerate development and go to market faster, OFS engaged Decision Inc. Australia to help guide and define a Minimum Viable Product (MVP) for its new GenAI feature, now known as Mayvn AI. The Challenge As manufacturers increasingly look to their software partners for AI capabilities, OFS recognised the need to rapidly build a GenAI experience that delivered practical, real-world value. While OFS already had an in-house engineering team, it turned to Decision Inc. for support in validating the concept, developing a secure architecture, and accelerating its roadmap to MVP. 'Manufacturers are under enormous pressure to do more with less, and they're asking smart questions about how AI can help,' said James Magee, CEO of OFS. 'Our goal was to build something meaningful, not gimmicky. We didn't want AI just for the sake of AI, we wanted it to solve real problems.' OFS needed a solution that would securely integrate with its existing architecture, scale into production, and provide early confidence that the AI feature could meet the performance and privacy standards its customers expect. At the same time, the OFS team wanted to build internal knowledge and capability as part of the process. The Solution With Decision Inc. Australia's support, OFS accelerated the development of Mayvn AI, a GenAI-powered operational intelligence tool designed to deliver actionable insights directly to factory leaders, engineers, and operators. The goal was to bridge the gap between factory data and leadership decision-making, turning real-time production feedback into meaningful conversations and faster resolutions. Decision Inc.'s AI Advisory services helped OFS quickly define a scalable MVP architecture that would sit securely within their existing software ecosystem. From architecture design to proof-of-concept prototyping, Decision Inc. played a hands-on role in helping OFS validate the feasibility of GenAI integration while upskilling their internal engineering team to take the solution to production. Mayvn AI functions as an intelligent co-pilot for manufacturing leadership, surfacing real-time insights from complex production data using simple prompts. By leveraging large language models (LLMs), Mayvn AI can instantly summarise plant performance, flag inefficiencies, identify recurring issues, and even recommend areas for capital investment based on real-world downtime or waste patterns. It is designed to operate securely, respecting data privacy boundaries across client environments. From generating daily shift reports to uncovering root causes of line stoppages and packaging faults, Mayvn AI is enabling factory teams to spend less time pulling data and more time acting on it. A notable early use case allows a CEO or site leader to ask, 'What should I know before walking into this site today?' and receive a concise, personalised briefing within seconds. 'Decision Inc. brought the technical clarity and early-stage confidence we needed to move quickly,' said Magee. 'They helped us de-risk the process and ensure we were building something scalable, secure, and aligned with the needs of our customers.' The Outcome Mayvn AI was launched in February 2025 and is already being used by more than 200 manufacturers worldwide. The feature acts as an embedded GenAI assistant within the OFS platform, enabling business leaders to simply ask questions, like 'What's been impacting our line efficiency this week?' or 'Where should we invest at this site?' and receive real-time, production-specific answers. By transforming unstructured manufacturing data into actionable insights, Mayvn AI bridges the gap between leadership and the factory floor. It's helping users surface previously hidden trends, reduce prep time for shift reports, and justify capital investments with data-backed recommendations. 'What we're seeing now is our clients using AI not just to analyse data, but to tell stories with it, to understand what's happening in their business and act with confidence,' said Magee. 'That's where Mayvn AI is making the biggest impact.' 'Mayvn AI is a perfect example of how GenAI can be deployed responsibly and pragmatically,' said Tony Butler, Managing Director, Decision Inc. Australia. 'By starting with a real problem and focusing on the user experience, they've built something powerful and future-ready. We were proud to play a role in accelerating that journey.' # # # About Decision Inc. Australia Decision Inc. is a global Advisory led Technology Consulting company helping clients in 20 markets use technology and data to improve their performance and drive sustainable growth. We help the world's most significant businesses transform their operating model and use technology to create a new future. Our Advisory teams help our clients develop the Strategies and Business Cases to support their continued investment in Innovation and Operational Improvement, and our specialist consulting and engineering teams build and manage the core platforms that run their business. As seen in The Australian Financial Review, The Weekend Australian and The Canberra Times we serve our community and industry and believe great data and analytics expertise will underpin economic recovery and prosperity. We are proudly carbon neutral and Great Place to Work certified.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store