logo
Exclusive: Nvidia, Foxconn in talks to deploy humanoid robots at Houston AI server making plant

Exclusive: Nvidia, Foxconn in talks to deploy humanoid robots at Houston AI server making plant

Reuters4 hours ago

TAIPEI, June 20 (Reuters) - Taiwan's Foxconn (2317.TW), opens new tab and U.S. artificial intelligence chips maker Nvidia (NVDA.O), opens new tab are in talks to deploy humanoid robots at a new Foxconn factory in Houston that will produce Nvidia AI servers, two sources familiar with the matter said.
This would be the first time that an Nvidia product will be made with the assistance of humanoid robots and would be Foxconn's first AI server factory to use them on a production line, the sources said.
A deployment, expected to be finalised in the coming months, would mark a milestone in the adoption of the human-like robots that promises to transform manufacturing processes.
Foxconn is developing its own humanoid robots with Nvidia and has also trialed humanoids made by China's UBTech (9880.HK), opens new tab. The sources said it was not clear what type of humanoid robots are being planned for use in the Houston factory, what they will look like or how many will be deployed initially.
They said the two companies are aiming to have the humanoid robots at work by the first quarter of next year when Foxconn's new Houston factory will begin production of Nvidia's GB300 AI servers.
And while it was not clear what exactly the robots will be doing at the factory, Foxconn has been training them to pick and place objects, insert cables and do assembly work, according to a company presentation in May.
Foxconn's Houston factory was ideally suited to deploy humanoid robots because it will be new and have more space than other existing AI server manufacturing sites, one of the sources said.
Nvidia and Foxconn declined to comment.
The sources did not wish to be identified as they are not authorised to speak to the media.
Leo Guo, general manager of the robotics business unit at Foxconn Industrial Internet (601138.SS), opens new tab, a subsidiary of Foxconn that is in charge of the group's AI server business, said last month at an industry event in Taipei that Foxconn plans to showcase at the company's annual technology event in November two versions of humanoid robots that it has developed.
One of those will be with legs and the other will use a wheeled autonomous mobile robot (AMR) base, which would cost less than the version with legs, he said, without disclosing details.
Nvidia announced in April that it planned to build AI supercomputer manufacturing factories in Texas, partnering with Foxconn in Houston and Wistron (3231.TW), opens new tab in Dallas. Both sites are expected to ramp up production within 12 to 15 months.
For Nvidia, using humanoid robots in the manufacturing of its AI servers represents a further push into the technology as it already supplies humanoid makers with a platform they can use to build such robots.
Nvidia CEO Jensen Huang predicted in March that their wide use in manufacturing facilities was less than five years away.
Automakers such as Germany's Mercedes-Benz (MBGn.DE), opens new tab and BMW (BMWG.DE), opens new tab have tested the use of humanoids on production lines, while Tesla (TSLA.O), opens new tab is developing its own. China has also thrown its weight behind humanoids, betting that many factory tasks will eventually be performed by such robots.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Aflac finds suspicious activity on US network that may impact Social Security numbers, other data
Aflac finds suspicious activity on US network that may impact Social Security numbers, other data

The Independent

time22 minutes ago

  • The Independent

Aflac finds suspicious activity on US network that may impact Social Security numbers, other data

Aflac says that it has identified suspicious activity on its network in the U.S. that may impact Social Security numbers and other personal information, calling the incident part of a cybercrime campaign against the insurance industry. The company said Friday that the intrusion was stopped within hours. 'We continue to serve our customers as we respond to this incident and can underwrite policies, review claims, and otherwise service our customers as usual,' Aflac said in a statement. The company said that it's in the early stages of a review of the incident, and so far is unable to determine the total number of affected individuals. Aflac Inc. said potentially impacted files contain claims information, health information, Social Security numbers, and other personal information, related to customers, beneficiaries, employees, agents, and other individuals in its U.S. business. The Columbus, Georgia, company said that it will offer free credit monitoring and identity theft protection and Medical Shield for 24 months to anyone that calls its call center. Cyberattacks against companies have been rampant for years, but a string of attacks on retail companies have raised awareness of the issue because the breaches can impact customers. United Natural Foods, a wholesale distributor that supplies Whole Foods and other grocers, said earlier this month that a breach of its systems was disrupting its ability to fulfill orders — leaving many stores without certain items. In the U.K., consumers could not order from the website of Marks & Spencer for more than six weeks — and found fewer in-store options after hackers targeted the British clothing, home goods and food retailer. A cyberattack on Co-op, a U.K. grocery chain, also led to empty shelves in some stores. A security breach detected by Victoria's Secret last month led the popular lingerie seller to shut down its U.S. shopping site for nearly four days, as well as to halt some in-store services. Victoria's Secret later disclosed that its corporate systems also were affected, too, causing the company to delay the release of its first quarter earnings. The North Face said that it discovered a 'small-scale credential stuffing attack' on its website in April. The company reported that no credit card data was compromised and said the incident, which impacted 1,500 consumers, was 'quickly contained.' Adidas disclosed last month that an 'unauthorized external party' obtained some data, which was mostly contact information, through a third-party customer service provider.

Professional Quality Voice Cloning : Open Source vs ElevenLabs
Professional Quality Voice Cloning : Open Source vs ElevenLabs

Geeky Gadgets

timean hour ago

  • Geeky Gadgets

Professional Quality Voice Cloning : Open Source vs ElevenLabs

What if you could replicate a voice so convincingly that even the closest of listeners couldn't tell the difference? The rise of professional-quality voice cloning has made this a reality, transforming industries from entertainment to customer service. But as this technology becomes more accessible, a pivotal question emerges: should you opt for the polished convenience of a commercial platform like ElevenLabs, or embrace the flexibility and cost-efficiency of open source solutions? The answer isn't as straightforward as it seems. While ElevenLabs promises quick results with minimal effort, open source tools offer a deeper level of customization—if you're willing to invest the time and expertise. This tension between convenience and control lies at the heart of the debate. In this article, Trelis Research explore the key differences between open source voice cloning models and ElevenLabs, diving into their strengths, limitations, and use cases. From the meticulous process of preparing high-quality audio data to the technical nuances of fine-tuning models like CSM1B and Orpheus, you'll uncover what it takes to achieve truly lifelike voice replication. Along the way, we'll also examine the ethical considerations and potential risks that come with wielding such powerful technology. Whether you're a curious enthusiast or a professional seeking tailored solutions, this exploration will challenge your assumptions and help you make an informed choice. After all, the voice you clone may be more than just a tool—it could be a reflection of your values and priorities. Mastering Voice Cloning What Is Voice Cloning? Voice cloning involves training a model to replicate a specific voice for text-to-speech (TTS) applications. This process requires high-quality audio data and advanced modeling techniques to produce results that are both realistic and expressive. Commercial platforms like ElevenLabs provide fast and efficient solutions, but open source models offer a cost-effective alternative for those willing to invest time in training and customization. By using these tools, you can create highly personalized voice outputs tailored to your specific needs. Data Preparation: The Foundation of Accurate Voice Cloning High-quality data is the cornerstone of successful voice cloning. To train a model effectively, you'll need at least three hours of clean, high-resolution audio recordings. The preparation process involves several critical steps that ensure the dataset captures the unique characteristics of a voice: Audio Cleaning: Remove background noise and normalize volume levels to ensure clarity and consistency. Remove background noise and normalize volume levels to ensure clarity and consistency. Audio Chunking: Divide recordings into 30-second segments, maintaining sentence boundaries to preserve coherence and context. Divide recordings into 30-second segments, maintaining sentence boundaries to preserve coherence and context. Audio Transcription: Use tools like Whisper to align text with audio, creating precise and synchronized training data. These steps are essential for capturing the nuances of a voice, including its tone, pitch, and emotional expression, which are critical for producing realistic outputs. Open Source vs ElevenLabs Watch this video on YouTube. Gain further expertise in AI voice cloning by checking out these recommendations. Open source Models: Exploring the Alternatives Open source voice cloning models provide powerful alternatives to commercial platforms, offering flexibility and customization. Two notable models, CSM1B (Sesame) and Orpheus, stand out for their unique features and capabilities: CSM1B (Sesame): This model employs a hierarchical token-based architecture to represent audio. It supports fine-tuning with LoRA (Low-Rank Adaptation), making it efficient for training on limited hardware while delivering high-quality results. This model employs a hierarchical token-based architecture to represent audio. It supports fine-tuning with LoRA (Low-Rank Adaptation), making it efficient for training on limited hardware while delivering high-quality results. Orpheus: With 3 billion parameters, Orpheus uses a multi-token approach for detailed audio representation. While it produces highly realistic outputs, its size can lead to slower inference times and increased complexity during tokenization and decoding. When fine-tuned with sufficient data, these models can rival or even surpass the quality of commercial solutions like ElevenLabs, offering a customizable and cost-effective option for professionals. Fine-Tuning: Customizing Open source Models Fine-tuning is a critical step in adapting pre-trained models to replicate specific voices. By applying techniques like LoRA, you can customize models without requiring extensive computational resources. During this process, it's important to monitor metrics such as training loss and validation loss to ensure the model is learning effectively. Comparing the outputs of fine-tuned models with real recordings helps validate their performance and identify areas for improvement. This iterative approach ensures that the final model delivers accurate and expressive results. Open Source vs. ElevenLabs: Key Differences ElevenLabs offers a streamlined voice cloning solution, delivering high-quality results with minimal input data. Its quick cloning feature allows you to replicate voices using small audio samples, making it an attractive option for users seeking convenience. However, this approach often lacks the precision and customization offered by open source models trained on larger datasets. Open source solutions like CSM1B and Orpheus, when fine-tuned, can match or even exceed the quality of ElevenLabs, providing a more flexible and cost-effective alternative for users with specific requirements. Generating Audio: Bringing Text to Life The final step in voice cloning is generating audio from text. Fine-tuned models can produce highly realistic outputs, especially when paired with reference audio samples to enhance voice similarity. However, deploying these models for high-load inference can present challenges due to limited library support and hardware constraints. Careful planning and optimization are essential to ensure smooth deployment and consistent performance, particularly for applications requiring real-time or large-scale audio generation. Technical Foundations of Voice Cloning The success of voice cloning relies on advanced technical architectures that enable models to produce realistic and expressive outputs. Key elements include: Token-Based Architecture: Audio is broken into tokens, capturing features such as pitch, tone, and rhythm for detailed representation. Audio is broken into tokens, capturing features such as pitch, tone, and rhythm for detailed representation. Hierarchical Representations: These allow models to understand complex audio features, enhancing expressiveness and naturalness in the generated outputs. These allow models to understand complex audio features, enhancing expressiveness and naturalness in the generated outputs. Decoding Strategies: Differences in decoding methods between models like CSM1B and Orpheus influence both the speed and quality of the generated audio. Understanding these technical aspects can help you select the right model and optimize it for your specific use case. Ethical Considerations in Voice Cloning Voice cloning technology raises important ethical concerns, particularly regarding potential misuse. The ability to create deepfake audio poses risks to privacy, security, and trust. As a user, it's your responsibility to ensure that your applications adhere to ethical guidelines. Prioritize transparency, verify the authenticity of cloned voices, and use the technology responsibly to avoid contributing to misuse or harm. Best Practices for Achieving Professional Results To achieve professional-quality voice cloning, follow these best practices: Use clean, high-quality audio recordings for training to ensure accuracy and clarity. Combine fine-tuning with cloning techniques to enhance voice similarity and expressiveness. Evaluate models on unseen data to test their generalization and reliability before deployment. These practices will help you maximize the potential of your voice cloning projects while maintaining ethical standards. Tools and Resources for Voice Cloning Several tools and platforms can support your voice cloning efforts, streamlining the process and improving results: Transcription Tools: Whisper is a reliable option for aligning text with audio during data preparation. Whisper is a reliable option for aligning text with audio during data preparation. Libraries and Datasets: Platforms like Hugging Face and Unsloth provide extensive resources for training and fine-tuning models. Platforms like Hugging Face and Unsloth provide extensive resources for training and fine-tuning models. Training Environments: Services like Google Colab, RunPod, and Vast AI offer cost-effective solutions for model training and experimentation. By using these resources, you can simplify your workflow and achieve high-quality results in your voice cloning projects. Media Credit: Trelis Research Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store