Latest news with #DeepThink


The Hindu
23-05-2025
- Business
- The Hindu
Google's AI Matryoshka: Rearchitecting the search giant with AI even as privacy concerns loom
Google's annual I/O developer conference in 2025 was less a showcase of disparate product updates and more a systematic unveiling of an AI-centric future. The unspoken theme was that of a Matryoshka doll: at its core, a refined and potent artificial intelligence, with each successive layer representing a product or platform drawing life from this central intelligence. Google is not merely sprinkling AI across its offerings; it is fundamentally rearchitecting its vast ecosystem around it. The result is an increasingly interconnected and agentic experience, one that extends to users, developers, and enterprises alike, prompting a re-evaluation of the firm's responsibilities concerning the data that fuels this transformation. 'More intelligence is available, for everyone, everywhere,' declared Sundar Pichai, CEO of Google and its parent company, Alphabet. 'And the world is responding, adopting AI faster than ever before.' This statement signals a push towards a more intelligent, autonomous, and personalised Google. Yet, as each layer of this AI Matryoshka is peeled back, the data upon which this intelligence is built, the copyrighted material ingested by its models, and the implications for user privacy are brought into sharper focus, forming a critical, if less trumpeted, narrative. It has been nearly two years since Satya Nadella of Microsoft described Google as an '800-pound gorilla' challenged to perform new AI tricks. Google's response, particularly evident at I/O 2025, suggests the gorilla is learning to pirouette. At the innermost core of Google's AI strategy lie its foundational models. The keenly awaited Gemini 2.5 Flash and Pro models, now nearing general availability, represent more than incremental improvements; they are a refined engine for AI experiences. The 'enhanced reasoning mode in Gemini 2.5 Pro,' dubbed Deep Think, which leverages parallel processing, demonstrates impressive capabilities in complex mathematics and coding, even achieving a notable score on the 2025 USAMO, a demanding mathematics benchmark. While Deep Think will initially be available to select testers via the Gemini API, its potential to grapple with highly complex problems signals a significant advancement in AI reasoning. Workhorse Upgraded Gemini 2.5 Flash, the workhorse model, has also received substantial upgrades, purportedly becoming 'better in nearly every dimension.' It boasts increased efficiency, using 20-30% fewer tokens (the units of data processed by AI models), and is set to become the default in the Gemini application. These models, enhanced with native audio output for more naturalistic conversational interactions in 2.5 Pro and Flash, and a pioneering multi-speaker text-to-speech function supporting two voices across over 24 languages, constitute the powerful nucleus from which all other AI functionalities radiate. This computational prowess is built upon Google's proprietary Tensor Processing Units (TPUs). The seventh generation TPU, Ironwood, is said to deliver a tenfold performance increase over its predecessor, offering a formidable 42.5 exaFLOPS of compute per pod. Such hardware forms the bedrock for training and deploying these sophisticated AI systems. However, the very power of these generative models, especially Imagen 4 and Veo 3 for visual media, and Lyria 2 for music generation, necessitates a closer look at their training data. The creation of rich, nuanced outputs depends on ingesting colossal datasets. Persistent industry-wide concerns revolve around the use of copyrighted material without explicit consent or remuneration for original creators. Google highlighted tools such as SynthID, designed to watermark AI-generated content, and a new SynthID Detector for its verification. Yet, these are mitigations, not comprehensive solutions, to the intricate and ongoing debate surrounding copyright and fair use in an era increasingly defined by generative AI. The provenance and a Fiduciary responsibility over the data remain complex issues. Platform Proliferation One layer out from the core models are the platforms and APIs that democratise access to this AI. The Gemini API and Vertex AI are pivotal here, serving as the primary conduits for developers and enterprises. Google aims to improve the developer experience by offering 'thought summaries,' providing transparency into the model's reasoning, and extending 'thinking budgets' to Gemini 2.5 Pro, giving developers more control over computational resources. Critically, native SDK support for the Model Context Protocol (MCP) has been incorporated into the Gemini API. This represents a significant move towards fostering a more interconnected ecosystem of AI agents, enabling them to communicate and collaborate with greater efficacy by sharing contextual information. This inter-agent communication, while powerful, also introduces new vectors for data security considerations, as information flows between potentially diverse systems. Project Mariner, a research tool, is also being integrated into the Gemini API and Vertex AI, allowing users to experiment with its task automation capabilities. AI Meets the User The outermost layers of Google's AI Matryoshka are where users most directly encounter AI, often without fully comprehending the sophisticated infrastructure beneath. This is where Google is reimagining search, commerce, coding, and application integration. The 'AI Mode' in Search, scheduled for rollout to users in the United States, will offer enhanced reasoning and multimodal search capabilities, powered by a customised version of Gemini 2.5. A feature within this mode, Deep Search, is designed to generate comprehensive, cited reports. The quality and impartiality of these citations, especially when generated by AI, will be an area for careful scrutiny. Within AI Mode, a novel shopping experience will allow users to virtually try on clothes by uploading their own photographs. Once a product is selected, an 'agentic checkout' feature, initially available in the U.S., promises to complete the purchase. Such a feature inherently requires access to sensitive personal and financial data, raising questions about data minimisation, security, and the potential for profiling. The All-in-One App The Gemini application itself is being significantly augmented. The Live feature is now generally available on Android and iOS, and the app incorporates image generation. For subscribers to the new Google AI Ultra tier, the app will feature the latest video generation tool, complete with native audio. A 'Deep Research' function within the app can now draw upon users' private documents and images. While potentially offering powerful personal insights, this feature dives deep into personal data pools, demanding robust privacy safeguards and transparent consent mechanisms. How this data is firewalled, processed, and protected from misuse or overreach will be paramount. Canvas, the creative workspace within Gemini, has been made more intuitive with the Gemini 2.5 models, facilitating the creation of interactive infographics, quizzes, and even podcast-style Audio Overviews in 45 languages. Furthermore, Gemini is being integrated into the Chrome browser (initially for Pro and Ultra subscribers in the U.S.), enabling users to query and summarise webpage content. For developers, the new asynchronous coding agent, Jules, is now in public beta globally where Gemini models are accessible. It integrates directly with existing code repositories, understanding project context to write tests, build features, and rectify bugs using Gemini 2.5 Pro. Mr. Pichai's 'new phase of the AI platform shift' is undeniably underway. Google's introduction of a new Google AI Ultra subscription tier offers users differentiated access to its most advanced AI capabilities. This stratification, however, prompts questions about whether the most robust privacy-enhancing features or responsible AI controls will be universally available or if a 'privacy premium' could emerge, where deeper safeguards are reserved for paying customers. As Google rearchitects itself around AI, the intricate dance between innovation, utility, and the stewardship of data will define its next chapter. The layers of the Matryoshka are still being revealed, and with each one, the responsibilities grow.


Techday NZ
22-05-2025
- Business
- Techday NZ
Google unveils Gemini 2.5 upgrades for reasoning & security
Google has provided a series of updates to its Gemini 2.5 model series, with enhancements spanning advanced reasoning, developer capabilities and security safeguards. The company reported that Gemini 2.5 Pro is now the leading model on the WebDev Arena coding leaderboard, holding an ELO score of 1415. It also leads across all leaderboards in LMArena, a platform that measures human preferences in multiple dimensions. Additionally, Gemini 2.5 Pro's 1 million-token context window was highlighted as supporting strong long context and video understanding performance. Integration with LearnLM, a family of models developed with educational experts, resulted in Gemini 2.5 Pro apparently becoming the foremost model for learning. According to Google, in direct comparisons focusing on pedagogy and effectiveness, Gemini 2.5 Pro was favoured by educators and experts over other models in a wide range of scenarios. The model outperformed others based on the five principles of learning science used in AI system design for education. Gemini 2.5 Pro introduced an experimental capability called Deep Think, which is being tested to enable enhanced reasoning by allowing the model to consider multiple hypotheses before responding. The company said, "2.5 Pro Deep Think gets an impressive score on 2025 USAMO, currently one of the hardest math benchmarks. It also leads on LiveCodeBench, a difficult benchmark for competition-level coding, and scores 84.0% on MMMU, which tests multimodal reasoning." Safety and evaluation measures are being emphasised with Deep Think. "Because we're defining the frontier with 2.5 Pro DeepThink, we're taking extra time to conduct more frontier safety evaluations and get further input from safety experts. As part of that, we're going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available," the company reported. Google announced improvements to 2.5 Flash, describing it as the most efficient in the series, tailored for speed and cost efficiency. This version now reportedly uses 20-30% fewer tokens in evaluations and delivers improved performance across benchmarks for reasoning, multimodality, code, and long-context tasks. The updated 2.5 Flash is now available for preview in Google AI Studio, Vertex AI, and the Gemini app. New features have also been added to the Gemini 2.5 series. The Live API now offers a preview version supporting audio-visual input and native audio output. This is designed to create more natural and expressive conversational experiences. According to Google, "It also allows the user to steer its tone, accent and style of speaking. For example, you can tell the model to use a dramatic voice when telling a story. And it supports tool use, to be able to search on your behalf." Early features in this update include Affective Dialogue, where the model can detect and respond to emotions in a user's voice; Proactive Audio, which enables the model to ignore background conversations and determine when to respond; and enhanced reasoning in live API use. Multi-speaker support has also been introduced for text-to-speech capabilities, allowing audio generation with two distinct voices and support for over 24 languages, including seamless transitions between them. Project Mariner's computer use capabilities are being integrated into the Gemini API and Vertex AI, with multiple enterprises testing the tool. Google stated, "Companies like Automation Anywhere, UiPath, Browserbase, Autotab, The Interaction Company and Cartwheel are exploring its potential, and we're excited to roll it out more broadly for developers to experiment with this summer." On the security front, Gemini 2.5 includes advanced safeguards against indirect prompt injections, which involve malicious instructions embedded into retrieved data. According to disclosures, "Our new security approach helped significantly increase Gemini's protection rate against indirect prompt injection attacks during tool use, making Gemini 2.5 our most secure model family to date." Google is introducing new developer tools with thought summaries in the Gemini API and Vertex AI. These summaries convert the model's raw processing into structured formats with headers and action notes. Google stated, "We hope that with a more structured, streamlined format on the model's thinking process, developers and users will find the interactions with Gemini models easier to understand and debug." Additional features include thinking budgets for 2.5 Pro, allowing developers to control the model's computation resources to balance quality and speed. This can also completely disable the model's advanced reasoning capability if desired. Model Context Protocol (MCP) support has been added for SDK integration, aiming to enable easier development of agentic applications using both open-source and hosted tools. Google affirmed its intention to sustain research and development efforts as the Gemini 2.5 series evolves, stating, "We're always innovating on new approaches to improve our models and our developer experience, including making them more efficient and performant, and continuing to respond to developer feedback, so please keep it coming! We also continue to double down on the breadth and depth of our fundamental research — pushing the frontiers of Gemini's capabilities. More to come soon."


Forbes
21-05-2025
- Business
- Forbes
Lessons From Google I/O 2025 To Cut $20B In Healthcare Waste
SAN FRANCISCO, CA - MAY 28: U.S. health systems waste about $20 billion annually on manual administrative tasks that automation could eliminate. At Google I/O 2025, the headline announcement centered on Search with AI Mode powered by Gemini 2.5. Google positioned AI as the operating system for everything. This shift presents tangible opportunities for healthcare organizations to enhance operational efficiency, improve the patient experience, and support better clinical outcomes. The following breakdown outlines what's new in Search and how it can enhance healthcare operations. Google announced that AI Mode is rolling out to all U.S. users, offering a reimagined search experience that utilizes Gemini 2.5's advanced reasoning, multimodal capabilities, and contextual understanding to handle complex, longer queries (twice to three times longer than traditional searches) and follow-up questions. It provides conversational answers instead of traditional link-based results, featuring personalized results based on Gmail data (e.g., tailored Smart Replies) and the ability to create custom charts for sports or finance queries. Gemini 2.5 Pro powers AI Mode with upgraded reasoning, multimodal input, and code generation capabilities. AI mode can also generate graphs or synthesize complex financial and sports data. The vision here is shifting from search as a query-response to search as a problem-solving approach. Google also introduced Deep Think, an experimental feature designed to enhance the model's ability to produce more refined, well-reasoned output by considering multiple answers to a question and applying more complex logic. While still in limited testing, Deep Think reflects Google's broader effort to embed higher-order reasoning into its AI systems. Gemini 2.5 Flash, a faster and more efficient variant, also supports AI Mode and is optimized for performance and responsiveness. Vertex AI Search integrates with AI Mode, allowing clinicians to query and retrieve relevant information from electronic health records (EHRs), medical white papers, third-party systems such as radiology or pathology, and clinical guidelines in real-time. It will be interesting to see the partnerships among point solutions and whether they are choosing to partner with disparate AI solutions. This is the search-side capability where the system breaks down queries, performs web exploration, iteratively browses, and synthesizes information into structured responses or reports. AI Mode incorporates Deep Search, a feature similar to Gemini's Deep Research. It analyzes hundreds of sources in real time to generate comprehensive research reports, enhancing its utility for in-depth queries. Healthcare organizations can utilize Deep Search to aid physicians in summarizing a patient's medical history across multiple providers, analyzing lab results and other tests, and generating a comprehensive overview to support informed clinical decision-making. AI Mode leverages Project Mariner's agentic capabilities, allowing users to delegate tasks like finding and purchasing event tickets (e.g., 'Find two affordable tickets for this Saturday's Reds game in the lower level') or making reservations, with user approval. Agentic capabilities make Search more action-oriented and proactive. Healthcare organizations can use Project Mariner to automate scheduling for patient appointments, staff shifts, and third-party referrals, such as booking a colonoscopy with a specialist. A scheduler or even the patient might prompt the system with, 'Book a follow-up for me with Dr. Jones next week and check if Dr. Smith is available for a colonoscopy.' The system can then cross-reference availability across multiple, and potentially disparate, EHR systems to coordinate the request. Healthcare organizations can utilize Project Mariner to verify patient insurance eligibility or assist patients in obtaining coverage through Medicaid. Project Mariner will reduce administrative errors and processing time for the staff. AI Mode marks the next phase of Google Search, focusing on speed, personalization, and conversation search. For Google healthcare customers, it offers a practical way to enhance access to information and streamline interactions by integrating with tools they already use.


GSM Arena
21-05-2025
- Business
- GSM Arena
Google I/O 2025 announcements: Gemini 2.5 models, Imagen 4, Veo 3 and Flow
Google I/O 2025 was a big one with a host of announcements, mostly focused on AI. We got updates and new features for the Gemini 2.5 models (Pro and Flash), alongside the more powerful Imagen 4 image generator, and Veo 3 AI video generator. Google also unveiled Flow - a dedicated AI filmmaking tool that combines Imagen, Veo and Gemini models to create cinematic scenes from simple text prompts. Gemini 2.5 brings extended language support to over 24 languages with text-to-speech and expressive voices. Google claims improved reasoning, multimodality, coding and long context capabilities for both Flash and Pro models. The big new update on the Pro model is the added Deep Think reasoning mode for highly complex math and coding tasks. The feature is still in its 'experimental' phase and will be released to trusted testers soon. Google says it can consider multiple hypotheses before responding. Gemini 2.5 Pro is now the leading AI mode on WebDev Arena and LMArena benchmarks. It offers enhanced coding and web app building tools and up to a 1 million token window for long context understanding. Gemini 2.5 also improves security against indirect prompt injections. Gemini 2.5 Flash is available in preview to all users in the Gemini app, while its general release is coming later in June. Gemini 2.5 Pro's commercial release will follow shortly after. Imagen 4 can now output images in up to 2K resolution. Google claims it improved text accuracy in generated cards, posters and comics. Imagen 4 is available starting today across the Gemini, Google Workspace, Whisk and Vertex AI apps. Imagen 4 samples (shared by Google) Veo 3 is Google's latest AI video model and features improved text-to-video prompt recognition. It can output video with sound, character dialogue and background noises. Veo 3 is available starting today for Google AI Ultra subscribers in the US and Vertex AI enterprise users. Veo 2 is getting camera movements, object addition and removal functionality. Users can also add images for style control and outpainting to extend frames beyond the original borders. Flow is Google's new AI filmmaking tool which combines the capabilities of the Veo, Imagen and Lyria models for cinematic scenes with more detail. Google claims Flow can help storytellers create exceptional cinematic clips that excel at physics and realism. Users can control camera motion, angles and perspectives. You can also edit and extend previously generated videos. Google Flow is now available to Google AI Pro and Ultra subscribers in the US.


The National
21-05-2025
- Business
- The National
Google takes aim at rivals with hi-tech 3D video calls
Google has launched a host of new artificial intelligence services, including the long-developed video call service that renders participants in 3D, as it attempts to maintain its leading position in the increasingly crowded AI space. The Alphabet-owned company said that the latest version of its flagship generative AI model, Gemini 2.5 Pro, will now support native audio generation and computer use, which can be used to develop apps, it said at its annual I/O conference in California on Tuesday. Google is also adding DeepThink, a new reasoning mode that uses new research techniques that considers "multiple hypotheses" before delivering a response. Chief executive Sundar Pichai teased DeepThink earlier on Tuesday, posting an image of him and Demis Hassabis, the chief executive of DeepMind, which is an AI unit of Alphabet. Google also said that it is rolling out its AI Mode tool – the upgraded version of AI Overviews that allows users to make more complex queries compared to traditional search – on its search engine in the US on Tuesday, powered by Gemini AI. It will add more features and release it in other markets based on initial feedback, further intensifying the fight between generative AI-based search platforms. AI Mode was previously only available to certain users. The company also added a shopping tool on AI Mode, which provides details on more than 50 billion products, its prices and even the ability to virtually try on items such as clothes. AI Overviews, which provides summaries on search results, is now available in more than 200 countries and over 40 languages, including Arabic – marking the first time it has been made available in the Middle East and North Africa. Using AI Overviews in Google Search "is one of the most successful launches" of the company in the past decade, driving about 10 per cent of searches in markets like the US and India, said Liz Reid, a vice president and head of research at Google. Google also unveiled Veo 4 and Imagen 4, its latest generative AI models for video and images, respectively, and Flow, a new tool for filmmaking. It has also expanded access to the music-focused Lyria 2. Its coding assistant, Jules, has been made available on public beta. On the hardware front, the company unveiled Google Beam, the product formerly known as Project Starline – first introduced at I/O 2021 – that has been evolved into a platform that will render video call participants in "realistic 3D from any perspective". It is now available on Google Meet. A workplace version, which has a monitor and six cameras for rendering, will be introduced in collaboration with HP. They will be unveiled at the InfoComm audiovisual exhibition in Florida next month. Meanwhile, Android XR, Google's operating system for smart glasses and headsets, have been upgraded with new features – focused on messaging, photography and directions, among others – and is slated to be rolled out on Samsung's Project Moohan, which Google confirmed will be released "later this year". Project Moohan, the Google-Samsung collaboration, was teased at the latter's Unpacked in January. Hyesoon Jeong, an executive vice president at Samsung Electronics, at that time told The National that the company would not rule out the possibility of it being released in 2026, but said it will be out "when it's ready". The wearable augmented reality device was among a series of upcoming hardware from the Suwon-based company, including the slimmer Galaxy S25 Edge and the Project Moohan XR headset, which was also on display at Unpacked. Google's moves are meant to cater to a "world [that] is responding and adapting [to AI] faster than ever before", Mr Pichai said. "More intelligence is available for everyone, everywhere … what all this progress means is that we're in a new phase of the AI platform shift, where decades of research are now becoming reality for people, businesses and communities all over the world." Google, one of the principal drivers of the AI revolution, unveiled its Gemini platform in 2023 and became one of the leading players in generative AI, rivalling the likes of OpenAI's ChatGPT and Microsoft's Copilot. The generative AI race has since intensified, with many players entering to tap into the potential of the technology that has disrupted businesses and individual users alike. Google and other established Big Tech companies are also trying to fend off the challenge of Chinese AI models, which are lower-priced, or even free in some cases, and are reported to be better. The company did not respond to a request for comment from The National on how Chinese companies are shifting the AI landscape, and if their emergence is having any influence on AI development and costing. Google also introduced a new Google AI Ultra subscription, which includes services such as Gemini – also integrated into the Chrome browser and the Google suite that includes Gmail, documents and videos – Flow, YouTube Premium and 30TB of cloud storage, for $250 a month in the US.