logo
#

Latest news with #Evi3

Voice cloning, celebrity impersonations and the need for safeguarding — Hume's CEO sounds off on the world of AI voice generation
Voice cloning, celebrity impersonations and the need for safeguarding — Hume's CEO sounds off on the world of AI voice generation

Tom's Guide

time10-08-2025

  • Entertainment
  • Tom's Guide

Voice cloning, celebrity impersonations and the need for safeguarding — Hume's CEO sounds off on the world of AI voice generation

On a Wednesday afternoon, I'm sitting on a video call listening to Ricky Gervais tell me a joke about voice cloning. Then, Audrey Heburn follows up to tell me her opinions on artificial intelligence. Unsurprisingly, neither of these people were actually on the call. Instead, it's Hume's CEO and chief scientist, Dr Alan Cowen, on the other side. He's showing off the latest update to his company's AI voice creation service EVI 3. Given just 30 seconds of audio, the tool can create a near-perfect replica of someone's voice. Not just their tone or accent, this new feature captures and replicates mannerisms and personality, too. Ricky Gervais telling me jokes about voice cloning features has his same dry wit and sarcastic tone. And Audrey Heburn is wistful and intrigued, while talking in a softer British accent of the time. But it's not just celebrities. This tool can take and replicate any voice in the world, all from just one small audio clip. Obviously, a tool like this has the benefit of changing the world, both for the better and the worse. Cowen sat down with Tom's Guide to explain this new tool, his background, and why his team wants to revolutionize the world of AI voice cloning. Hume operates in an area of AI that oddly doesn't come up as much. They are a voice generation software, making the claim of being 'the world's most realistic voice AI'. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. I think this is the fastest evolving part of the AI space. There are competitors from OpenAI and Google, but what we've done with Evi 3 is take the technology to the next step. It has come a long way over the years, now offering text-to-speech with a range of preset voices, as well as the ability to design a voice from a description. Now, with this latest update, the company can also clone any and all voices. 'I think this is the fastest evolving part of the AI space. There are competitors from OpenAI and Google, but what we've done with Evi 3 is take the technology to the next step,' Cowen explained on the call. 'Previous models have relied on mimicking specific people. Then you need loads of data to fine-tune for each person. This model instead replicates exactly what a person sounds like, including their emotions and personality.' This is achieved by using Hume's large backlog of voice data and reinforcement learning so that they don't have to mimic specific people. Give the model a 30-second clip, and it can recreate it from scratch. This allows the model to learn your specific inflections, accent and personality, while training it against a huge backlog of voice data to fill in the gaps. Of course, a model like this works best when given a good representation. A muffled clip of you talking in a monotone voice won't match your personality much. However, it currently only works for English and Spanish, with plans for more languages in the future. If, like me, your first thought at hearing all of this is concern, then surprisingly you have something in common with Cowen. 'I think this could be very misused. Early on at Hume, we were so concerned about these risks that we decided not to pursue voice cloning. But we've changed our mind because there are so many people with legitimate use cases for voice cloning that have approached us,' Cowen explained. 'The legitimate use cases include things like live translation, dubbing, making content more accessible, being able to replicate your own voice for scripts, or even celebrities who want to reach fans.' While these use cases do exist, there are just as many negative ones out there as well. Sam Altman, CEO of OpenAI, recently warned of the risks of AI voice cloning and its ability to be used in scams and bank voice activations. This technology, paired with video and image generation could be the push deepfakes have needed for a while to become truly problematic. Cowen explained that he was aware of these concerns and claimed that Hume was approaching it as best as they could. 'We are releasing a lot of safeguards with this technology. We analyze every conversation ,and we're still improving in this regard. But we can score how likely it is that something is being misused on a variety of dimensions. Whether somebody is being scammed or impersonated without permission,' Cowen said. 'We can obviously shut off access when people aren't using it correctly. In our terms, you have to comply with a bunch of ethical guidelines that we introduced alongside the Hume Initiative. These concerns have been on our mind since we started, and as we continue to unroll these technologies, we are improving our safeguarding too.' The Hume Initiative is a project set up by the Hume company. It's ethos is that modern technology should, above all, serve our emotional well-being. That is somewhat vague, but the Initiative lists out six principles for empathetic technologies: Of course, while these are good guidelines to follow, they are subjective, and only beneficial when followed. Cowen assured me that these are beliefs that Hume stands by and that, when it comes to voice cloning, they are well aware of the risks. Early on at Hume, we were so concerned about these risks that we decided not to pursue voice cloning. But we've changed our mind because there are so many people with legitimate use cases for voice cloning that have approached us. 'We are at the forefront of this technology and we try to stay ahead of it. I think that there will be people that don't respect the guidelines of this kind of tool. I don't want people to walk away thinking there is no danger here, there is,' Cowen explained. 'People should be concerned about deepfakes on the phone, they should be wary of these types of scams, and it something that I think we need a cross-industry attempt to address.' Despite being aware of the risks, Cowen explained that he thought this was a technology that they had to build. 'The AI space moves so fast that I don't doubt that a bad actor in six months will have access to something like this technology. We need to be careful of that,' Cowen said. Cowen spent a lot of our chat focusing on guidelines and the legitimate concerns of this kind of technology. His background is in Psychology and strongly believes that this kind of technology will have more of a positive effect on people's wellbeing than negative. 'People have been really enjoying cloning their voices with our demo. We've had thousands of conversations already, which is remarkable. People are using it in a really fun way,' Cowen said, after discussing what he thinks people get wrong about this kind of technology. He strongly believes that it can be used for fun, to help build people's confidence and can even be used for training purposes or for voice acting needs in films as well as dubbing. Of course, just like with many other areas of AI, the positive benefits are competing with the negative. Being able to have a generic voice read a script is useful, but rather uneventful in risk. Being able to accurately recreate any voice in the world comes with a long list of concerns. For now, Cowen and his team are way ahead in this venture, and seem committed to the ethical side of the debate, but we remain early into the life of this kind of technology.

I test AI every day — the best AI voice generator I've ever used just dropped
I test AI every day — the best AI voice generator I've ever used just dropped

Tom's Guide

time03-06-2025

  • Business
  • Tom's Guide

I test AI every day — the best AI voice generator I've ever used just dropped

You've probably heard an AI-generated voice by now. They are usually pretty noticeable, lacking in emotion or styling and still fairly robotic in nature. But with its latest update, Hume is changing that. Evi 3, the latest version of the company's leading AI-voice generator, has just landed. While it has an array of pre-made voices to try out, the thing that really sets this model above the rest is its customization options. Hume gave me access to an early version of the tool, allowing me to try out an array of custom voices and the ability to generate one down to the smallest details. Here's how it went. Once you're on the Hume system, you get a few options. You can choose from a selection of pre-set voices, but that's not what we're here for. The new feature on offer is the ability to design a voice. Do this and you'll enter a conversation with an AI. This will ask you questions about the voice you want to generate. Either you can take control, pitching an exact style of voice you want, or you can allow Hume to suggest options for you to take a guided approach. I tried a variety of different options here, some pretty simple in nature, some overly complicated and specific. Impressively, most of the voices fit my descriptions pretty closely. For example, the first voice I asked for was raspy and low in energy. It had a villainous nature, similar to a bad guy you would see in a fantasy film who rules over some evil kingdom. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Incredibly specific, but a few minutes later, I had a voice that fit this exact description chatting away to me. Not only is the tone matched, but also his vocal mannerisms, mocking me and being sarcastic in his tone. Next, a British game show host, complete with an old-fashioned accent, a lot of energy, and an overly positive nature. Again, both accent and tone are matched up to the request surprisingly well. I went on to try an array of different voices, ranging from pirate-like in nature to very simple American accents. All of which were achieved by Hume. This isn't to say it was perfect throughout. Sometimes the voices would clip, revealing a slight robotic-sounding voice or a slip in the accent. It is also always clear you're talking to an AI, however, only slightly so, still sounding more human than robot. You can also only design these specific voices by having a back-and-forth voice conversation with Hume AI. It would be great to be able to add text prompts to generate them, especially when you can't speak. This also means it takes twice as long to generate a voice, having to work through a long conversation to get a result. It's a small but noticeable concern. Compared to text, video, and image generation, AI voice generation hasn't seen the same push. Companies like Elevenlabs have been at the forefront, and the likes of Google and OpenAI have made progress in the field. However, this is the best attempt I've seen in terms of customisation on the voices, not to mention in the tone and personality of said voices. Hume claims that in a blind comparison against OpenAI's GPT-4o, EVI 3 was rated higher, on average, on empathy, expressiveness, naturalness, interruption quality, response speed, and audio quality. The company also claims that the model outperformed GPT-4o, Gemini, and Sesame (a popular AI voice system) in ratings of how well it acted out a wide range of emotions and styles to study participants. This, for now, puts Hume in a great place in the market. However, AI moves fast. While it currently stands out as a leader in AI voice generation, especially in terms of creative expression, they'll have to keep the updates coming to stay ahead. You can try Hume's Evi 3 now on the Hume AI dashboard.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store