Latest news with #ProjectMariner

Engadget
7 days ago
- Business
- Engadget
Opera's new 'fully agentic' browser can surf the web for you
It was only earlier this year Norway's Opera released a new browser, and now it's adding yet another offering to an already crowded field. Opera is billing Neon as a "fully agentic browser." It comes with an integrated AI that can chat with users and surf the web on their behalf. Compared to competing agents, the company says Neon is faster and more efficient at navigating the internet on its own due to the fact it parses webpages by analyzing their layout data. Building on Opera's recent preview of Browser Operator, Neon can also complete tasks for you, like filling out a form or doing some online shopping. The more you use Neon to write, the more it will learn your personal style and adapt to it. All of this happens locally, in order to ensure user data remains private. To view this content, you'll need to update your privacy settings. Please click here and view the "Content and social-media partners" setting to do so. Additionally, Neon can make things for you, including websites, animations and even game prototypes, according to Opera. If you ask Neon to build something particularly complicated or time-consuming, it can continue the task even when you're offline. This part of the browser's feature set depends on a connection to Opera's servers in Europe where privacy laws are more robust than in North America. "Opera Neon is the first step towards fundamentally re-imagining what a browser can be in the age of intelligent agents," the company says. If all of this sounds familiar, it's because other companies, including Google and OpenAI, have been working on similar products. In the case of Google, the search giant began previewing Project Mariner, an extension that adds a web-surfing agent to Chrome, last December. OpenAI, similarly, has been working on its own "Operator" mode since the start of the year. Neon, therefore, sees Opera attempting to position itself as an innovator in hopes of claiming market share, but the company has a difficult task ahead. According to data from StatCounter, only about 2.09 percent of internet users use Opera to access the web. Chrome, by contrast, commands a dominant 66.45 percent of the market. That's a hard hill to climb when your competitors are working on similar features. It's also worth asking if an agentic browser is something people really want. Opera suggests Neon is smart enough to book a trip for you. That sounds great in theory, but what if the agent makes an error and books the wrong connecting flight. A certain amount of friction ensures users pay attention and check things on their own. If you want to try Neon for yourself, you can join the wait list.


Digital Trends
7 days ago
- Business
- Digital Trends
Can AI really replace your keyboard and mouse?
'Hey ChatGPT, left-click on the enter password field in the pop-up window appearing in the lower left quadrant of the screen and fill XUS&(#($J, and press Enter.' Fun, eh? No, thanks. I'll just move my cheap mouse and type the 12 characters on my needlessly clicky keyboard, instead of speaking the password out loud in my co-working space. Recommended Videos It's pretty cool to see ChatGPT understand your voice command, book a cheap ticket for eight people to watch a Liverpool match at Anfield, and land you at the checkout screen. But hey, will you trust it with the password? Or, won't you just type the password with a physical keyboard? Imagine going all-in on AI, only to realize that the last-mile step, where you REALLY need a keyboard or mouse, is not possible, and you're now stuck. But that's exactly the question many have been asking after seeing flashy AI agents and automation videos from the likes of Google, OpenAI, and Anthropic. It's a legitimate question AI was the overarching theme at Google's I/O event earlier this year. By the end of the keynote, I was convinced that Android smartphones are not going to be the same again. And by that extension, any platform where Gemini is going to land — from Workspace apps such as Gmail to navigation on Google Maps while sitting in a car. The most impressive demo was Project Mariner, and the next research prototype of Project Astra. Think of it as a next-gen conversational assistant that will have you talk and get real stuff done, without ever tapping on the screen or pulling up the keyboard. You can shift your queries from a user manual hosted on a brand's website to instructional YouTube videos, without ever repeating the context. It's almost as if the true concept of memory has arrived for AI. In a web browser, it's going to book you tickets, landing you on the final page where you simply have to confirm if all the details are as requested, and you proceed with the payment. That leads one to wonder whether the keyboard and mouse are dead concepts for digital inputs as voice interactions come to the forefront of AI. The burden of error Now, as odd as that sounds, your computer already comes with voice-based control for navigating through the operating system. On Windows PCs and macOS, you can find the voice access tools as part of the accessibility suite. There are a handful of shortcuts available to speed up the process, and you can create your own, as well. With the advent of next-gen AI models, we're talking about ditching the keyboard and mouse for everyone, and not just pushing it as an assistive technology. Imagine a combination of Claude Computer Use and the eye-tracked input from Apple's Vision Pro headset coming together. In case you're unfamiliar, Anthropic's Computer Use is a, well, computer use agent. Anthropic says it lets the AI 'use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.' Now, think of a scenario where your intent is given as voice to Claude, picked up by the onboard mics, and the task is executed. For whatever final step is required of you, gestures fill the gap. The Vision Pro has demonstrated that eye-tracked controls are possible and work with a high degree of accuracy. Away from headsets, voice-controlled AI can still work on an average computer. Hume AI, in partnership with Anthropic, is building a system called Empathetic Voice Interface 2 (EVI 2) that turns voice commands into computer input. It's almost like talking to Alexa, but instead of ordering broccoli, the AI assistant understands what we are saying and turns it into keyboard or mouse input. All that sounds terrific, but let's think of a few realistic scenarios. You will need a keyboard for fine-tuned media edits. Making minor changes to a coding canvas. Filling cells in a sheet. Imagine saying, 'Hey Gemini, put four thousand eight hundred and ninety-five dollars in cell D5 and label it as air travel expense?' Yeah, I know. I'd just type it, too. The last mile, not the end If you go through demos of AI Mode in Search, the Project Mariner agent, and Gemini Live, you will get a glimpse of voice computing. All these AI advancements sound stunningly convenient, until they're not. For example, at what time does it get too irritating to say things like 'Move to the dialog box in the top-left corner and left click on the blue button that says Confirm.' It's too cumbersome, even if all the steps before it were performed autonomously by an AI. And let's not forget the elephant in the room. AI has a habit of going haywire. 'At this stage, it is still experimental—at times cumbersome and error-prone,' warns Anthropic about Claude Computer Use. The situation is not too dissimilar from OpenAI's Operator Agent, or a similar tool of the same name currently in development at Opera, the folks behind a pretty cool web browser. Removing the keyboard and mouse from an AI-boosted computer is like driving a Tesla with full self-driving (FSD) enabled, but you no longer have the steering and the controls available are the brake and accelerator pedals. The car is definitely going to take you somewhere, but you need to take over if some unexpected event transpires. In the computing context, think of the troubleshooter, where you MUST be in the driving seat. But let's assume that an AI model, driven primarily by voice (and captured by the mic on your preferred computing machine), lands you at the final step where you need to close the workflow, like making a payment. Even with Passkeys, you will need to at least confirm your identity by entering the password, opening an authenticator app, or touching a fingerprint sensor? No OS-maker or app developer (especially dealing with identity verification) would let an AI model have open control over handling this critical task. It's just too risky to automate with an AI agent, even with conveniences like Passkeys coming into the picture. Google often says the Gemini will learn from memory and your own interactions. But it all begins with actually letting it monitor your computer usage, which is fundamentally reliant on keyboard and mouse input. So yeah, we're back to square one. Go virtual? It's a long wait When we talk about replacing the computer mouse and keyboard with AI (or any other advancement), we are merely talking about substituting them with a proxy. And then landing at a familiar replacement. There is plenty of research material out there talking about virtual mice and keyboard, dating back at least a decade, long before the seminal 'transformers' paper was released and pushed the AI industry into the next gear. In 2013, DexType released an app that tapped into the tiny Leap Motion hardware to enable a virtual typing experience in the air. No touch screen required, or any fancy laser projector like the Humane AI Pin. Leap Motion died in 2019, but the idea didn't. Meta is arguably the only company that has a realistic software and hardware stack ready for an alternative form of input-output on computing, something it calls human-computer interaction (HCI). The company has been working on wrist-worn wearables that enable an entirely different form of gesture-based control. Instead of tracking the spatial movement of fingers and limbs, Meta is using a technique called electromyography (EMG). It turns electrical motor nerve signals generated in the wrist into digital input for controlling devices. And yes, cursor and keyboard input are very much part of the package. At the same time, Meta also claims that these gestures will be faster than a typical key press, because we are talking about electrical signals traveling from the hand straight to a computer, instead of finger movement. 'It's a much faster way to act on the instructions that you already send to your device when you tap to select a song on your phone, click a mouse or type on a keyboard today' says Meta. Fewer replacements, more repackaging There are two problems with Meta's approach, with or without AI coming into the picture. The concept of a cursor is still very much there, and so is the keyboard, even though in a digital format. We are just switching from the physical to virtual. The replacement being pushed by Meta sounds very futuristic, especially with Meta's multi-modal Llama AI models coming into the picture. Then there's the existential dilemma. These wearables are still very much in the realm of research labs. And when they come out, they won't be cheap, at least for the first few years. Even barebones third-party apps like WowMouse are bound to subscriptions and held back by OS limitations. I can't imagine ditching my cheap $100 keyboard with an experimental device for voice or gesture-based input, and imagine it replacing the full keyboard and mouse input for my daily workflow. Most importantly, it will take a while before developers embrace natural language-driven inputs into their apps. That's going to be a long, drawn-out process. What about alternatives? Well, we already have apps such as WowMouse, which turns your smartwatch into a gesture recognition hub for finger and palm movements. However, it only serves as a replacement for cursor and tap gestures, and not really a full-fledged keyboard experience. But again, letting apps access your keyboard is a risk that OS overlords will protest. Remember keyloggers? At the end of the day, we are at a point where the conversational capabilities of AI models and their agentic chops are making a huge leap. But they would still require you to go past the finish line with a mouse click or a few key presses, instead of fully replacing them. Also, they're just too cumbersome when you can hit a keyboard shortcut or mouse instead of narrating a long chain of voice commands. In a nutshell, AI will reduce our reliance on physical input, but won't replace it. At least, not for the masses.


Geeky Gadgets
7 days ago
- Business
- Geeky Gadgets
Project Mariner AI Web Browser : First Tests and Impressions
What if your browser could think for itself—retrieving data, navigating websites, and even running code—all without you lifting a finger? That's the bold promise behind Google's Project Mariner, an experimental AI agent designed to tackle browser-based tasks with minimal human intervention. But does it deliver on this vision of autonomy, or does it stumble under the weight of its ambition? In its first five tests, Project Mariner showcased moments of brilliance, such as extracting YouTube metrics with ease, but also revealed critical flaws, particularly when faced with secure platforms or complex interactions. These early trials offer a fascinating glimpse into the future of AI-driven productivity—and the hurdles we'll need to overcome to get there. All About AI explores the strengths and shortcomings of Project Mariner across five diverse scenarios, from retrieving live stream details to executing Python code. Along the way, you'll discover where this AI agent shines—like its ability to handle basic form interactions—and where it falters, such as its struggles with external AI tools like ChatGPT. Whether you're intrigued by the potential of browser-based automation or curious about the challenges of creating a truly autonomous agent, these insights will leave you pondering just how close we are to a future where AI can seamlessly navigate the digital world on our behalf. Project Mariner Overview Task 1: Retrieving Video Metrics Project Mariner successfully retrieved the view count of a specific YouTube video, showcasing its ability to navigate websites and extract relevant data. This task highlighted the agent's competence in basic web navigation and information retrieval. By efficiently locating the video and extracting the desired metrics, it demonstrated a solid foundation for handling straightforward search tasks. However, its success in this scenario also raises questions about how it might perform when faced with more complex or dynamic web environments. Task 2: Email and Live Stream Information Retrieval The agent achieved partial success in gathering details about a live stream event but encountered significant difficulties with email-related tasks. When tasked with logging into Gmail to send an email, Project Mariner struggled to complete the process autonomously. Even with manual login assistance, it was unable to navigate the platform effectively. This limitation highlights its current inability to handle secure platform interactions, which is a critical area for improvement. The challenges faced in this scenario emphasize the need for enhanced capabilities in managing authentication protocols and executing tasks within secure environments. Project Mariner AI Agent Browser First Impressions Watch this video on YouTube. Find more information on AI agents by browsing our extensive range of articles, guides and tutorials. Task 3: Website Navigation and Form Interaction In this scenario, Project Mariner navigated to the DeepMind diffusion model page and interacted with a waitlist form. It successfully located the form and modified its fields, demonstrating its capability for basic form interaction. However, certain actions required user input, indicating a reliance on manual intervention for more complex tasks. While its performance in locating and modifying form elements was commendable, the agent's limited autonomy in this area suggests that further development is needed to enable it to handle more intricate interactions independently. Task 4: Python Code Execution Project Mariner identified an online platform for executing Python code and successfully ran a simple script. This task underscored its ability to locate suitable platforms and perform basic code execution. However, the agent required additional user instructions to complete the task, suggesting that its problem-solving capabilities in coding environments are still evolving. Despite these limitations, its performance in this area was among the most promising of the five tests, indicating potential for further development in programming-related tasks. Task 5: Interaction with ChatGPT When tasked with accessing ChatGPT for a discussion on software engineering, the agent encountered navigation errors and failed to complete the task. This revealed significant challenges in interacting with external AI tools, particularly when navigating complex interfaces or meeting platform-specific requirements. The inability to complete this task underscores a critical gap in Project Mariner's functionality, highlighting the need for improved adaptability and error-handling mechanisms when engaging with external systems. Key Observations Project Mariner's performance across the five tests revealed a combination of strengths and weaknesses. These observations provide a clearer understanding of its current capabilities and the areas that require further development. Strengths: The agent demonstrated effectiveness in retrieving information, navigating websites, and executing simple scripts, showcasing its potential for handling straightforward tasks. The agent demonstrated effectiveness in retrieving information, navigating websites, and executing simple scripts, showcasing its potential for handling straightforward tasks. Weaknesses: It struggled with secure platform interactions, email automation, and navigating external AI tools, highlighting critical gaps in its functionality. It struggled with secure platform interactions, email automation, and navigating external AI tools, highlighting critical gaps in its functionality. Its limited autonomy in handling complex tasks often necessitated user intervention, reducing its overall efficiency and independence. Occasional errors in task execution, particularly in scenarios involving intricate interfaces or multi-step processes, further emphasized the need for refinement. Future Prospects and Development Needs Project Mariner demonstrates significant potential as a browser-based AI agent, particularly for tasks involving basic web navigation and simple code execution. However, its current limitations in handling secure platforms, interacting with external AI tools, and executing autonomous operations indicate that substantial improvements are required. Addressing these challenges will be essential for unlocking its full potential and allowing it to handle more complex and independent tasks effectively. By focusing on enhancing its problem-solving capabilities, adaptability, and error-handling mechanisms, Project Mariner could evolve into a more robust and versatile tool for a wide range of applications. Media Credit: All About AI Filed Under: AI, Reviews, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
Yahoo
27-05-2025
- Business
- Yahoo
Google Makes AI Agent Prototype Available to US Users
Google has released its AI agent research prototype Project Mariner to users in the US. Jaclyn Konzelmann, director of product management for Google Labs, speaks about the human-AI agent interactions with Bloomberg's Jackie Davalos at Google I/O. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Hans India
22-05-2025
- Business
- Hans India
Google's Vision: A Future Where AI Does the Googling for You
At this year's Google I/O, artificial intelligence wasn'tjust part of the story—it was the story. The tech giant unveiled a vision forthe future in which Google's AI doesn't just help you search, it does thesearching for you. The centre of this evolution is AI Mode in Google Search, nowrolling out across the U.S. This new mode transforms the traditional search barinto an intelligent, chatbot-like interface capable of understanding complexqueries and pulling together comprehensive, curated responses. It's a step awayfrom the familiar list of blue links and a leap toward an AI that actively doesyour online legwork. During the keynote, Google demonstrated how AI Mode couldplan a weekend getaway to Nashville for friends interested in food, music, andunique experiences. Instead of offering a basic search result, AI Mode createddynamic, themed suggestions such as 'restaurants good for foodies,' 'chill baratmosphere with live music,' and 'off-the-beaten-path attractions,' completewith a customised map and links to relevant sites. Behind this capability is what Google calls its 'queryfanout technique,' powered by a custom version of its Gemini model. Liz Reid,head of Google Search, explained: 'Now, under the hood, Search recognizes when a questionneeds advanced reasoning. It calls on our custom version of Gemini to break thequestion into different subtopics, and it issues a multitude of queriessimultaneously on your behalf... Search pulls together a response and checksits work to make sure it meets our high bar for information quality.' This means that what used to be several separate searchesare now bundled into one—executed and analyzed by Google's AI, which thendelivers a comprehensive response. The AI Mode interface even shows users howmany searches it's performing in the background. Later this summer, a new 'Deep Search' feature is slated toarrive within AI Mode. It builds on the same query fanout approach but scalesit significantly. According to Reid, Deep Search can 'issue dozens or evenhundreds of searches on your behalf' to produce even more in-depth answers. Another key player in Google's AI ecosystem is ProjectMariner, a behind-the-scenes tool that allows AI to perform complex web can juggle up to 10 operations simultaneously and features a 'Teach and Repeat'capability, allowing users to train the system to perform recurring tasks. Google is extending these functionalities to the Geminiapp's new Agent Mode, which also taps into Project Mariner. CEO Sundar Pichaigave a live example of how Agent Mode could help find an apartment in Austin byautomatically scanning listings on Zillow and surfacing the best matches. Additionally, Project Mariner will soon integrate into AIMode itself. Rajan Patel, VP of Engineering for Search, demonstrated how thetool could locate baseball tickets and provide a direct purchase button—allfrom within the search interface. What Google is proposing is a major transformation in how weinteract with the web. The company sees AI not just as an assistant but as areplacement for many of the routine search tasks users handle today. As Reidsummed it up: 'Google believes AI will be the most powerful engine fordiscovery that the web has ever seen.' If these AI tools deliver as promised, users may findthemselves spending less time searching—and more time simply receiving theanswers they need. In Google's future, your next deep dive into the web mightjust start and end with one query, handled entirely by AI.