logo
#

Latest news with #CopilotVision

How to watch the Microsoft Build 2025 keynote
How to watch the Microsoft Build 2025 keynote

Engadget

time15-05-2025

  • Engadget

How to watch the Microsoft Build 2025 keynote

Microsoft's annual Build developer conference kicks off on May 19, and as always, it starts with a keynote. You can watch the opening event live starting at 12:05 PM Eastern time on Microsoft's website, though you'd have to register and sign in to be able to access the livestream. Microsoft will also be streaming the keynote on YouTube, though, and you can access it below. Just like last year, the event will be hosted by Microsoft CEO Satya Nadella, along with the company's Chief Technology Officer, Kevin Scott. According to the keynote page, the executives will be sharing how "Microsoft is creating new opportunity across [its] platforms in this era of AI." To view this content, you'll need to update your privacy settings. Please click here and view the "Content and social-media partners" setting to do so. The company has been introducing new AI features at Build over the last few years, and that's most likely what's going to happen again this time. We expect Microsoft to add more AI agents to Windows 11 to automate more tasks for you on the operating system. It could also give us an in-depth look at Copilot Vision, a feature that allows the AI assistant to see what you're doing on your computer so it could talk you through various tasks. Microsoft likely wouldn't be announcing new hardware at the event, however, seeing as it has only recently launched a 12-inch Surface Pro tablet and a 13-inch Surface Laptop. Microsoft's Build conference will take place from May 19 to May 22. Two other tech events are also taking place around that time: Google's I/O conference from May 20 to 21 and the Computex computer expo in Taiwan from May 20 to 23. If you buy something through a link in this article, we may earn commission.

Microsoft Build 2025: Copilot, Windows 11, AI Agents and what more to expect
Microsoft Build 2025: Copilot, Windows 11, AI Agents and what more to expect

Hindustan Times

time12-05-2025

  • Hindustan Times

Microsoft Build 2025: Copilot, Windows 11, AI Agents and what more to expect

Microsoft's largest event of the year, Build 2025, is set to kick off on May 19 at the Seattle Convention Center and will run until May 22. The event will kick off on May 19 at 9:05 a.m. PT (9:35 pm IST) with a keynote by CEO Satya Nadella and CTO Kevin Scott. This year's focus is expected to remain on artificial intelligence, particularly developments around Copilot and Windows 11. Over the years, Microsoft Build has traditionally been centred around Azure, but recent years have seen AI gaining traction, and this year will likely be no different. AI's role in automating tasks and improving user experiences is expected to dominate the conversation. While the company has introduced new hardware like the Surface Pro and Surface Laptop, this event will be more software-focused, with a deeper dive into AI and its integration into Windows. Also read: iPadOS 19 update: Apple to unveil redesigned Siri, menu bar and more at WWDC 2025 A significant highlight will likely be updates to Windows 11, particularly in the area of AI agents. These agents, which can perform tasks on your behalf, have been a key focus for Microsoft. The company has already announced plans to introduce an AI agent within the Settings app, which will automate system adjustments. The keynote is expected to elaborate on this feature and possibly reveal other agents that can perform specific tasks across different applications. Developers may also learn how to integrate their own AI agents into apps to enhance functionality. Also read: Motorola Razr 60 Ultra key specifications tipped online ahead of launch on May 13 - Details File Explorer and the Start menu are also on the agenda for improvement. Microsoft has shared that new features will allow users to find and manage files without needing to open additional apps. Likewise, updates to the Start menu will simplify the app discovery process by enabling users to search for and download apps without needing to access the Microsoft Store. Another exciting development will be the expansion of Copilot. Copilot Vision, a feature that allows the AI to see and interact with your desktop or app windows, has already been rolled out on mobile. At Build 2025, Microsoft is expected to share further details about its availability on desktops and how it will enhance user productivity. Windows Insiders currently have access to Copilot Vision, but general availability remains unclear. Also read: iQOO Neo 10 confirmed to feature a 7,000mAh battery, Snapdragon 8s Gen 4 SoC and more - All details Copilot's capabilities will also be expanded with the introduction of the "Researcher" tool in March 2025. Using OpenAI's o3 reasoning model, researchers can gather data from OneDrive and the web to assist with research tasks. However, this tool is currently limited to Microsoft 365 subscribers, and it remains to be seen whether it will be made available to free Copilot users. Also read: Samsung Galaxy F56 launched in India with a slim design- Know specs, features, and more Although AI will dominate the event, Microsoft is also expected to provide updates on its core platforms, including Azure, .NET, and GitHub, which remain central to the company's business ecosystem. In the coming days, Microsoft will likely reveal more insights into its evolving AI strategy and how it plans to integrate these innovations into its vast product lineup.

Microsoft Build 2025: What to expect from Copilot, Windows 11 and AI agents
Microsoft Build 2025: What to expect from Copilot, Windows 11 and AI agents

Engadget

time09-05-2025

  • Engadget

Microsoft Build 2025: What to expect from Copilot, Windows 11 and AI agents

While the company might be pulling back some of its investments in the infrastructure that makes it run, Microsoft remains, at least publicly, intensely focused on AI and Copilot. The company's annual Microsoft Build developer conference runs from May 19 to 22 and typically touches on all of the company's various platforms, but it seems like AI will once again be the star. Microsoft Build is typically a pretty dry affair — Azure comes up a lot — but in the last few years the company has also used the conference to introduce new AI features that eventually make their way into consumer products. Since Microsoft recently released a new Surface Pro and Surface Laptop, too, the event should be all about software. You'll be able to watch the opening Build Keynote hosted by Microsoft CEO Satya Nadella and CTO Kevin Scott on the Build website, or if you want the highlights as they happen, follow along with Engadget's liveblog. In the meantime, we can make some educated guesses as to what Microsoft might touch on. Microsoft talked up agents — AI that can take action on your behalf — a lot at Microsoft Build 2024, and the ways AI is automating work in Windows will likely come up this year, too. Microsoft has announced plans to introduce an agent into the Settings app that can make adjustments to your computer for you. An in-depth look at the feature or a tease of other agents coming to Windows 11 seems like obvious subject for the keynote to touch on. The company will likely get in to how third-party developers can build agents into their own apps, too. Microsoft has also shared that its making changes to File Explorer to let you find and tweak files without jumping into another app, and upgrading the start menu so you can find and download apps without having to open the Microsoft Store. Both features could be highlighted at Build. To view this content, you'll need to update your privacy settings. Please click here and view the "Content and social-media partners" setting to do so. One of the most impressive features Microsoft has demoed for Copilot is the ability for the AI assistant to selectively see what you're doing and talk to you about it. Copilot Vision, as the feature is called, is already available on mobile, and Microsoft has teased an expanded version of the feature that can see your desktop or select app windows. Windows Insiders can already use Copilot Vision, but Microsoft hasn't shared when it'll come to normal users. Detailing the feature in-depth and expanding where it works seems like a natural things that could come up during Build. Microsoft introduced a new "Researcher" tool to Copilot in March 2025 that uses OpenAI's o3 reasoning model, but limited the feature to Microsoft 365 subscribers. The tool can perform research on your behalf, compiling information from multiple sources, like data from your OneDrive and web searches. Microsoft didn't announce plans to bring Researcher to the free version of Copilot in Windows, but it could do that at Build 2025. Microsoft maintains multiple platforms that act as the backbone of the world's other gigantic businesses. AI is more interesting to the average person, but the company will likely have more to share on Azure, .NET and GitHub at Build 2025, too.

I've Been Using Copilot Vision Again, and Now I Have Mixed Feelings
I've Been Using Copilot Vision Again, and Now I Have Mixed Feelings

Yahoo

time25-04-2025

  • Yahoo

I've Been Using Copilot Vision Again, and Now I Have Mixed Feelings

PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing. Copilot Vision in Edge is unlike anything I've seen in a and takes the concept of generative in a new direction. Instead of creating text or images based on prompts, the browsing companion analyzes everything visual and textual on a web page and verbally converses with you about it, providing context and explanations. Yes, the Google Lens feature in Chrome bears a slight resemblance, letting you highlight objects on a page and get search results in a side panel, but it's not at all conversational. I tested the preview version of Copilot Vision in early 2025, but the release version is now available to all via the desktop versions of the . Its main features are as impressive as ever, though some changes to how it works on web sites with private data give me pause. I'm here to walk you through how to get the most out of Copilot Vision and understand its downsides. Originally, Copilot Vision was available only to select subscribers ($20 per month), but now it's free for anyone with the desktop version of the Edge browser. Copilot Vision is entirely opt-in. To get started, first make sure your browser is up to date by choosing the three-dot menu at the top right, then Help & Feedback, and then About Microsoft Edge. The top section on this page checks if you're running the latest version. You must also sign in to a Microsoft account to use Copilot Vision, so do that next. Finally, open Edge, choose the Copilot icon at the top right to open the AI's side panel, and click the microphone button at the bottom. You then see a dialog box at the bottom center of the browser window where you can give the tool permission to enable the feature. The box explains it as "Get help from Copilot by sharing what's on your screen and talking about it. Copilot can hear your voice and respond like a friend in conversation." Hit the Accept button. Next, you see Copilot's toolbar across the bottom center of the windows, with an eyeglasses icon and a mic icon. Those are for stopping the AI from viewing your screen and muting your mic. Since the feature is speech-based (even though it's called Vision), you need to enable your mic for it. After you tap the microphone icon, you hear a piano lick, and then the lifelike AI speech of welcomes you. You can choose among four voice personalities: Canyon, Grove, Meadow, or Wave. I picked Wave (which has a British accent), though I use Canyon on my phone. Copilot Vision can describe and give you info on what you're seeing on a site, but also just chat about anything. Here's a small taste of what it looks and sounds like: As you can see, the interface for Copilot Vision is completely different from the Edge browser's Copilot sidebar, which is a standard ChatGPT-like chatbot interface. The Vision toolbar even disappears when you aren't using it. When I first clicked the Copilot Vision screen-share icon, the borders of the browser window changed color, in my case from blue to tan. The mic and eyeglasses icons turn red when they're active. A friendly voice said, 'Hey Michael, how are you doing today? What's on your mind? Or should I surprise you with something fun?' The main Copilot Vision for Edge page provides sample things to ask. For example, it showed a web page with four cute dogs and suggested I ask it to 'tell me more about these breeds.' The next suggestion was 'Summarize these articles,' which showed that the tool works with both images and text on web pages. Then, it demonstrated that it knows about geography by showing four cityscapes and prompting, 'Which of these cities has the oldest buildings?' Finally, it said, 'Now it's your turn,' and suggested some sites like Amazon, Target, Tripadvisor, and Wikipedia to get started. One of the sites Copilot proposed was GeoGuessr, which has its own World Cup. I told it that I wasn't interested in the soccer World Cup, and it assured me that this wasn't related. When I stopped speaking with it for a while, I got a message saying, 'Sorry, nodded off for a second! Try reconnecting.' That's actually a good thing, since you don't want it to keep listening if you accidentally leave it on. When I asked if I could provide feedback to Copilot Vision directly with my voice, I was impressed by its reply: 'Your feedback will be passed on to my developers.' If you interrupt Copilot Vision, it politely stops speaking. Interestingly, Copilot Vision told me that it sees only the part of the browser that I can see, and not anything beyond the visible window. But when I went to a page on Everyday Health, it summarized content far below what I could see. Since the standard Copilot sidebar in Edge can summarize entire web pages, it's possible that Vision was tapping into it for this info. When I navigated to my photos on , Copilot Vision described a bird photo there. It treated OneDrive content as private in the preview version, so that could now be a potential privacy concern. It also knew that a highly distorted photo on Flickr was of a spider, which I couldn't recognize at first. It can now see content on Instagram and other social networks, too. The mute button makes the AI stop listening to you, but I wish there were a button to silence it when it goes on a bit too long about background information on a page or in response to a query. Instead, you can say, 'Quiet!' This will stop it from speaking. That said, it remains active when you tell it to stop watching. You can just hit the X in its control panel at the bottom of the screen to close it in that case. I tripped up Copilot occasionally, especially when I asked it to do things it couldn't, like turn itself off. In those cases, it usually went back to describing what was on the page and adding more context and background info. Don't let this section get your hopes up too much: Copilot Vision can be active while you play games on the web, but not as a competitor or partner. Instead, it provides strategy tips or commentary about what's on the screen. When I started playing Mr. Mine on Copilot Vision knew how to play the game and what the goal was. When I asked how it knew about this little-known game, Copilot said it could read what was on my screen and, 'Yeah, I've got a knack for games.' I asked Copilot Vision whether it would be watching if I went on a pornography website and got a thoughtful answer saying, 'for safety and privacy, I don't store or share personal info.' Microsoft's documentation states that Copilot Vision doesn't use input for AI training. In other words, it doesn't see sensitive or legally protected information, including bank account credentials or passwords. When I tested the prerelease version of Copilot Vision in Edge, it stopped functioning as soon as I navigated to a bank website or signed into a OneDrive page. With this released version, it kept on working. When I asked if it shouldn't stop viewing these pages, it gave me the same "I don't store or share" response. I prefer the more cautious approach of the preview version. Copilot Vision can't open a new web page, which makes it less helpful than it could be. The tool can at least now detect your cursor position, which it couldn't do in my tests of the preview version. Copilot Vision technically can't see video, but it can provide feedback based on still frames from a video. It can't hear and interpret web page audio either. Finally, Copilot Vision can't provide a written transcript of your interactions with it; it would be nice to be able to see its answers. The regular Copilot, whether in the Edge sidebar or a separate app, does this. Copilot Vision is good at providing detailed spoken descriptions of what's on a web page, alongside rich background and context. It speaks sort of like a friend who has no opinions of their own—something you might appreciate! I also like how Microsoft now integrates Copilot Vision into the existing sidebar-based Copilot. And the new eyeglasses icon makes it clear that the AI is watching your screen. But there are some gaps that limit its usefulness. For example, I wish it could open web pages for you and turn itself off on your voice command. And I would have preferred it if Microsoft kept the preview version's behavior of not operating at all when you log in to a website, or at least made this an option. I hope the forthcoming version of Copilot Vision for Windows doesn't have these same drawbacks. For more on Copilot, check out our comparison between Copilot and Copilot+ and cool things you can do with a Copilot+ PC.

Google unlocks Veo 2 and smarter Gemini Live, as focus shifts to boosting AI adoption
Google unlocks Veo 2 and smarter Gemini Live, as focus shifts to boosting AI adoption

Hindustan Times

time25-04-2025

  • Business
  • Hindustan Times

Google unlocks Veo 2 and smarter Gemini Live, as focus shifts to boosting AI adoption

Google is unlocking a significant set of new features for Gemini users in India, and has released a first of its kind data on AI adoption in the country. It is a two-pronged approach to new features, which integrates artificial intelligence (AI) video generation capabilities within Gemini, as well as an AI agent being able to understand worldly context if a user enables access to the phone's camera or shares what's on the phone's screen. This, Google hopes, will widen Gemini's relevance, adding to its arsenal of tools that already include deep integration within Android phones as well as Google's Workspace, and AI Overviews in Search. There's the spectrum of competition too. In just the past few weeks, there has been significant progress in terms of AI models finding new potential capabilities, though a lot of the conversation remains around exactly that — potential, and possible purpose (there is of course an attempt to talk about benchmarks, but those may not translate in the real world). OpenAI's o3 and o4-mini, xAI adding Studio to Grok, Anthropic's Claude adding a Research envelope, and Microsoft adding Copilot Vision to the Edge web browser, some illustrations of rapid evolution with consumers in focus. The spark arguably was the release of Chinese AI DeepSeek in January. Their claim to fame was to have rewritten rules of affordable costs for creating an AI model. 'One exciting development has been the launch of the Gemini 2.5 model, that has really taken the generative AI capabilities to a whole new level,' Manish Gupta, Senior Director at Google DeepMind, points out in a conversation with HT. The Veo 2 video generation model now finds integration within Gemini, thereby adding an ability to generate detailed and natural-looking videos with a prompt. For now, it creates an eight-second video clip at 720p resolution, delivered as an MP4 file in a 16:9 landscape format. Google insists detailed prompts are key to how good the generated videos look — whether it's a short story, a visual concept, or a specific scene. The video generation capabilities are exclusive for Gemini Advanced subscribers — in India, this costs ₹1,950 per month. 'Going forward, one could see it in a multitude of spaces such as architecture, design and filmmaking. To that extent, therefore, we're just scraping the surface with this, but the quality is unimaginable,' Shekhar Khosla, Vice President, Marketing at Google India, tells us. Google confirms that Gemini's video outputs will be based on the same content policies and guardrails that define the wider generative AI usage in terms of safety, preventing outputs depicting violence, child abuse, violence, self-harm and dangerous activities such as drug use. To distinguish generated videos from ones shot by a user in the real world, these generations will have the SynthID digital watermark embedded in each frame, indicating the videos are AI-generated. 'One of the things where we have made some leadership contributions as a company is in the technology called Synth ID. It's a powerful technology where different kinds of content, be it video or an image or text, we are able to create a digital signature which identifies that content as AI generated. It is part of our policy to tag any of the AI generated content and any content generated using the Google tools gets marked with SynthID,' explains Gupta. Synth ID is now also available as open source. Alongside, Gemini Live is now arriving across Android phones capable of running the Gemini app (including Google's own Pixel 9 phones, and the Samsung Galaxy S25 Ultra), and will be able to understand context of the world around a user via the phone's camera or sharing what's on the screen. The context from the camera can help troubleshoot if a physical object around you isn't working properly, or help organise a living space. The ability to share what's on the phone screen with Gemini Live means help with getting started with a project, assistance with calculations or even studies, and even shopping advice. A lot of Gemini Live's contextual smarts emerge from the Project Astra prototype, which the company had made available under the Trusted Tester program. The more capable Gemini Live does not require a Gemini Advanced subscription, and is available in all Android phones that are capable of running the Gemini AI assistant on device. For now, there is no word on when the updated Gemini Live will bring the Apple iPhone into its fold. The value of Gemini Live's responses may vary for individuals, but Google hopes support for multiple Indian languages helps with relevance. Gemini, at this time, supports Hindi, Bengali, Gujarati, Kannada, Malayalam, Tamil, Telugu and Urdu, among the spectrum of Indian languages. 'We are not happy and we want to do more. The underlying model understands many more languages and we are trying to go well beyond the 22 scheduled languages, which is considered the Holy Grail. There are so many languages spoken in India and we want to make our models understand over 100 Indian languages,' Gupta explains the vision. Also Read: AI agents are an opportunity to rethink creativity: Adobe's Govind Balakrishnan A few weeks ago, Google released the Gemini 2.5 model, which Google DeepMind CEO Demis Hassabis calls 'an awesome state-of-the-art model, no.1 on LMArena by a whopping +39 ELO points, with significant improvements across the board in multimodal reasoning, coding & STEM'. Gemini's current model line-up available to users, including the Gemini 2.5 Pro (experimental) reasoning model and Gemini 2.0 Flash, include a Deep Research feature, wherein AI can analyse complex topics and generate detailed reports. A data and relevance question Artificial Intelligence (AI) adoption is yet to find momentum in India, particularly for consumers. A first of its kind country-focused survey by Google and analytics firm Kantar India, suggests that as many as 60% of respondents aren't familiar with any AI tool or app, and only 31% have experimented with any generative AI — their sample size includes 8,000 individuals across 18 Indian cities, and this survey culminated in March. Khosla believes it is also about the relevance of the tools. 'Our models now are multimodal, multilingual and have multiple access points. They're not limited to a few, whether it's a language, visual, voice or text,' he says. There is expectation that ecosystem partners including the Android phone makers, will help provide even greater visibility, adoption and education for users. 'Bringing meaningful relevance to people's lives, is important. You may access it, but if you don't find a difference, you will not come back to it,' Khosla adds. There is a brighter side to the Google-Kantar report, with suggestions that 75% of the respondents willing to adopt a 'growth collaborator' to help them boost productivity (72%), enhance creativity (77%), and communicate better (73%) in their daily routine at home and at work. Specific to users of Google's Gemini assistant, underlined by a family of multimodal large language models developed by Google DeepMind, the study suggests there is relevance for improving productivity (93% of Gemini users indicate as much), helping with creativity (85%) and tackling complexity (80%) with expert guidance or helping with decision making. These numbers underline a potential headroom for AI eventually becoming a regular tool for individuals, and are in stark contrast to enterprise AI adoption in the country. Two distinct sides of the coin for AI companies, one of lost time and the other of potential in one of the world's biggest markets, even as they've been releasing new models and functionalities at a steady pace over the past few months? In a report in November last year, the Boston Consulting Group had indicated as many as 30% of Indian enterprises and businesses are leveraging AI in some form — higher than the global average of 26%, which fintech, software and banking leading this momentum. Visual communications platform Canva, in their latest Visual Economy Report, indicate that 9 out of 10 surveyed businesses and enterprises in India are beginning to take first steps towards the use AI for content creation and visual communication tasks.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store