logo
#

Latest news with #inferencing

Five Expensive Myths About AI Inferencing (And How To Fix Them)
Five Expensive Myths About AI Inferencing (And How To Fix Them)

Forbes

time25-06-2025

  • Business
  • Forbes

Five Expensive Myths About AI Inferencing (And How To Fix Them)

Sven Oehme, Chief Technology Officer (CTO) at DDN, drives innovation across both current and future products. The AI boom shows no signs of slowing, but while training gets most of the headlines, it's inferencing where the real business impact happens. Every time a chatbot answers, a fraud alert triggers or a recommendation pops up, that's inferencing at work: models applying what they've learned to fresh data, often in real time. Inference isn't a background process. It's the front line of customer experience, risk mitigation and operational decision making. Yet many organizations still treat inference as an afterthought. This mistake can quietly sabotage performance, inflate costs and undermine AI strategies. Here are five common misconceptions about AI inferencing and what leaders can do differently to future-proof their infrastructure. 1. 'Training is the hard part—inference is easy.' The reality: Training happens occasionally. Inference happens continuously. Once a model is deployed, inference workloads don't just run once; they run millions (sometimes billions) of times a day. This scale fundamentally changes the economic equation: Over the life of a production AI system, inference often consumes the majority of infrastructure resources and budgets. Consider financial services: Detecting fraud across millions of daily transactions requires high-speed, low-latency inference at massive scale. A delay of even a few milliseconds can translate into missed opportunities or real financial losses. What To Do: • Monitor and optimize GPU utilization beyond training phases. • Architect systems to feed inference engines consistently and efficiently. • Design infrastructure specifically for high-frequency, real-time operations, not just batch processing. 2. 'Our storage is fast enough.' The reality: Traditional storage architectures aren't built for AI inference at scale. High-performance inferencing demands real-time access to massive, often unstructured datasets—images, video, embeddings or live sensor data. General-purpose NAS or object storage solutions, while fine for archival or transactional workloads, often can't meet the concurrency and throughput demands of production AI systems. In healthcare, for example, AI-assisted medical imaging requires inferencing with minimal delay. Storage-induced latency isn't just an inconvenience; it can delay diagnoses. What To Do: • Prioritize parallel file systems and storage designed for AI data patterns. • Build for concurrent data access and real-time throughput, not just static speed benchmarks. • Evaluate storage performance under live AI workload simulations, not synthetic tests. 3. 'We'll optimize inference performance later.' The reality: Deferred optimization leads to baked-in inefficiencies. Once models go live, any infrastructure gaps, such as latency, underutilized GPUs and storage bottlenecks, are exponentially harder and more expensive to fix. Poor early decisions often show up as growing technical debt, operational slowdowns and cost overruns. In industries like retail, where real-time LLM-powered agents increasingly handle customer interactions, a few hundred milliseconds of added latency can translate into lost sales or degraded brand experience. What To Do: • Build high-performance data pipelines before models go into production. • Design systems that scale seamlessly under live inference loads. • Automate performance monitoring from day one, especially GPU and storage utilization. 4. 'Cloud storage scales inference just fine.' The reality: Cloud storage is flexible but can become a major bottleneck for inference. Cloud object stores often introduce unpredictable latencies and steep egress fees at scale, especially when serving inference workloads that demand low response times and massive concurrency. For use cases like autonomous driving or industrial inspection, these drawbacks can be dealbreakers. Cloud infrastructure excels for certain training and experimentation phases, but inference at scale often demands hybrid or edge strategies to maintain performance and cost efficiency. What To Do: • Deploy hybrid architectures that keep inference close to the data source. • Optimize for low-latency edge access and minimize unnecessary data transfers. • Balance flexibility with performance and cost predictability. 5. 'Edge inferencing is optional. We'll just send data to the cloud.' The reality: In many sectors, local inferencing is mandatory. From autonomous vehicles to smart factories, edge inferencing reduces response times, cuts costs and ensures resilience even when network connections are imperfect. Sending everything to centralized clouds for processing often introduces unacceptable lag, measured in lost opportunities, safety risks or operational disruptions. For example, in manufacturing, detecting an assembly line anomaly needs to happen within milliseconds. Cloud-based roundtrips simply aren't fast enough. What To Do: • Invest in edge-ready AI infrastructure with local inferencing capabilities. • Ensure models can operate independently while staying connected for updates and telemetry. • Prioritize high-throughput, low-power solutions suited for field deployments. The Bottom Line AI success isn't just about how well you train models. It's about how reliably and efficiently you can deploy them in the real world, under real-time conditions, at real-world scale. Organizations that take inferencing seriously—architecting from the start for speed, scalability and resilience—will unlock far more value from their AI investments. Those who treat it as an afterthought risk finding that their smartest models never reach their full potential. In today's AI economy, the real winners won't be those who build the biggest models. They'll be the ones who deploy them better. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

NetApp and Intel Partner to Redefine AI for Enterprises
NetApp and Intel Partner to Redefine AI for Enterprises

National Post

time06-05-2025

  • Business
  • National Post

NetApp and Intel Partner to Redefine AI for Enterprises

Article content SAN JOSE, Calif. — NetApp® (NASDAQ: NTAP), the intelligent data infrastructure company, today announced the release of NetApp AIPod Mini with Intel, a joint solution designed to streamline enterprise adoption of AI inferencing. This collaboration addresses the unique challenges businesses face when deploying AI, such as cost and complexity, at the department and team level. To thrive in the era of intelligence, enterprises have adopted AI to enhance efficiency and data-driven decision making across their businesses. A study by Harvard Business School found that consultants given access to AI tools were able to increase their productivity, completing 12.2 percent more tasks and completing them 25.1 percent more quickly. However, individual business units may find that the broadly available general purpose AI applications are not able to meet their specific needs, but do not have the technical expertise or budget to customize an AI application from scratch. NetApp and Intel have partnered to provide businesses with an integrated AI inferencing solution built on an intelligent data infrastructure framework that allows specific business functions to leverage their distinct data to create outcomes that support their needs. NetApp AIPod Mini streamlines the deployment and use of AI for specific applications such as automating aspects of document drafting and research for legal teams, implementing personalized shopping experiences and dynamic pricing for retail teams, and optimizing predictive maintenance and supply chains for manufacturing units. Article content 'Our mission is to unlock AI for every team at every level without the traditional barriers of complexity or cost,' said Dallas Olson, Chief Commercial Officer at NetApp. 'NetApp AIPod Mini with Intel gives our customers a solution that not only transforms how teams can use AI but also makes it easy to customize, deploy, and maintain. We are turning proprietary enterprise data into powerful business outcomes.' Article content NetApp AIPod Mini enables businesses to interact directly with their business data through pre-packaged Retrieval-Augmented Generation (RAG) workflows, combining generative AI with proprietary information to deliver precise, context-aware insights that streamline operations and drive impactful outcomes. Article content By integrating Intel® Xeon® 6 processors and Intel® Advanced Matrix Extensions (Intel® AMX) with NetApp's all-flash storage, advanced data management, and deep Kubernetes integration, NetApp AIPod Mini delivers high-performance, cost-efficient AI inferencing at scale. Built on an open framework powered by Open Platform for Enterprise AI (OPEA), the solution ensures modular, flexible deployments tailored to business needs. Intel Xeon processors are designed to boost computing performance and efficiency, making AI tasks more attainable and cost-effective, empowering customers to achieve more. To best serve customers, this solution is designed to be: Article content Affordable: Designed for departmental or business-unit budgets, NetApp AIPod Mini delivers enterprise-grade performance at a low entry price. Designed with scalability in mind, the solution enables organizations to achieve AI advancements without wasting resources on unnecessary overhead or costs. Simple: With a pre-validated reference design, NetApp AIPod Mini makes AI implementation streamlined and effective. Its pre-packaged workflows enable quick setup, seamless integration, and customization without extra overhead. By focusing on ease of use and reliability, the solution helps enterprises deploy AI faster and more confidently, enabling smarter and more efficient operations. Secure: By leveraging NetApp, the most secure storage on the planet, and processing data on-premises, NetApp AIPod Mini enhances privacy and protects sensitive data. Customers can leverage the built-in cyber resiliency and governance capabilities of NetApp ONTAP®, including access controls, versioning, and traceability, to embed compliance directly into AI workflows with traceability and ethical safeguards. Article content 'A good AI solution needs to be both powerful and efficient to ensure it delivers a strong return on investment,' said Greg Ernst, Americas Corporate Vice President and General Manager at Intel. 'By combining Intel Xeon processors with NetApp's robust data management and storage capabilities, the NetApp AIPod Mini solution offers business units the chance to deploy AI in tackling their unique challenges. This solution empowers users to harness AI without the burden of oversized infrastructure or unnecessary technical complexity.' Article content NetApp AIPod Mini with Intel will be available in the summer of 2025 from strategic distributors and partners around the world. These initial launch partners will include NetApp distributor partners Arrow Electronics and TD SYNNEX as well as integration partners Insight Partners, CDW USA, CDW UK&I, Presidio and Long View Systems, who will provide dedicated support and service to ensure a seamless purchasing and deployment experience for customers' unique AI use cases. Article content NetApp is the intelligent data infrastructure company, combining unified data storage, integrated data, operational and workload services to turn a world of disruption into opportunity for every customer. NetApp creates silo-free infrastructure, harnessing observability and AI to enable the industry's best data management. As the only enterprise-grade storage service natively embedded in the world's biggest clouds, our data storage delivers seamless flexibility. In addition, our data services create a data advantage through superior cyber resilience, governance, and application agility. Our operational and workload services provide continuous optimization of performance and efficiency for infrastructure and workloads through observability and AI. No matter the data type, workload, or environment, with NetApp you can transform your data infrastructure to realize your business possibilities. Learn more at or follow us on X, LinkedIn, Facebook, and Instagram. Article content Article content Article content Article content Article content Contacts Article content Media Contact: Kenya Hayes NetApp Article content Article content Article content

Lumen and IBM Collaborate to Unlock Scalable AI for Businesses
Lumen and IBM Collaborate to Unlock Scalable AI for Businesses

Yahoo

time06-05-2025

  • Business
  • Yahoo

Lumen and IBM Collaborate to Unlock Scalable AI for Businesses

Companies to develop AI solutions that bring inferencing to the edge, helping businesses overcome cost and security challenges as they scale AI DENVER and ARMONK, N.Y., May 6, 2025 /PRNewswire/ -- Lumen Technologies (NYSE: LUMN) and IBM (NYSE: IBM) today announced a new collaboration to develop enterprise-grade AI solutions at the edge—integrating watsonx, IBM's portfolio of AI products, with Lumen's Edge Cloud infrastructure and network. Together, Lumen and IBM aim to bring powerful, real-time AI inferencing closer to where data is generated, helping companies overcome cost, latency, and security barriers as they scale AI adoption and enhance customer experiences. IBM Corporation logo. (PRNewsfoto/IBM Corporation) Lumen delivers low-latency, high through-put infrastructure that serves as the backbone of the emerging AI economy. The new AI inferencing solutions optimized for the edge will deploy IBM watsonx technology in Lumen's edge data centers and leverage Lumen's multi-cloud architecture, enabling clients across financial services, healthcare, manufacturing and retail to analyze massive volumes of data in near real-time to help minimize latency. This will allow enterprises to develop and deploy AI models closer to the point of data generation, facilitating smarter decision-making while maintaining data control and security, plus accelerating AI innovation. "Enterprise leaders don't just want to explore AI, they need to scale it quickly, cost effectively and securely," said Ryan Asdourian, Chief Marketing and Strategy officer at Lumen. "By combining IBM's AI innovation with Lumen's powerful network edge, we're turning vision into action—making it easier for businesses to tap into real-time intelligence wherever their data lives, accelerate innovation, and deliver smarter, faster customer experiences." Unlocking the Power of GenAI at the Edge Lumen's edge network offers <5ms latency and direct connectivity to major cloud providers and enterprise locations. When paired with IBM watsonx, the infrastructure has the potential to enable real-time AI processing, which can help mitigate costs and risks associated with public cloud dependence. IBM Consulting will act as the preferred systems integrator, supporting clients in their efforts to scale deployments, reduce their costs and fully leverage AI capabilities through their deep technology, domain, and industry expertise. "Our work with Lumen underscores a strong commitment to meeting clients where they are —bringing the power of enterprise-grade AI and hybrid cloud to wherever data lives," said Adam Lawrence, General Manager, Americas Technology, IBM. "Together, we're helping clients accelerate their AI journeys with greater speed, flexibility and security, driving new use cases at the edge ranging from automated customer service to predictive maintenance and intelligent supply chains."

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store