Latest news with #ModelScope


The Star
01-05-2025
- Business
- The Star
Analysts: Alibaba's Qwen3 AI model family helps narrow tech gap between China and US
Alibaba Group Holding's third-generation Qwen3 family of artificial intelligence (AI) models appears to have narrowed the gap between the United States and China in this field, while cementing the company's leadership position in the global open-source community, according to analysts and reports. Hangzhou-based Alibaba's cloud computing unit on Tuesday unveiled its much-anticipated Qwen3 family, consisting of eight enhanced models that range from 600 million to 235 billion parameters. Alibaba owns the South China Morning Post. In machine learning, parameters are the variables present in an AI system during training, which helps establish how data prompts yield the desired output. Alibaba's latest AI models showed that Chinese companies have significantly closed the gap with US firms, while the pace of innovation is expected to continue in spite of US export restrictions on advanced semiconductors, according to Su Lian Jye, chief analyst at research firm Omdia. Omdia's Su pointed out that the impact of such sanctions on China's AI development efforts have diminished, compared to previous years. There has been growing availability of alternative AI chips from domestic suppliers such as Huawei Technologies and Cambricon Technologies. Alibaba's latest AI model release reflects Qwen's current position as the world's largest open-source AI ecosystem, surpassing Facebook parent Meta Platforms' Llama community. Open source gives public access to a program's source code, allowing third-party software developers to modify or share its design, fix broken links or scale up its capabilities. Open-source technologies have been a huge contributor to China's tech industry over the past few decades. The Qwen3 model family is available on Microsoft's GitHub, the open-source AI community Hugging Face and Alibaba's own AI model hosting service, ModelScope. It has also been integrated into the web-based Qwen chatbot as the default model for user queries. Qwen3 has quickly become the most popular AI model family on platforms like Hugging Face, as the family's models combine reasoning ability, quick answers and cost-efficient adoption. 'Qwen family is the world's best, the most comprehensive, and the most widely used open-source model,' said Zhou Jingren, chief technology officer for Alibaba Cloud Intelligence and a key figure behind Qwen, in a report by Chinese media outlet LatePost. 'The whole market is pretty much in agreement about this.' Alibaba said Qwen3-235B, the largest variant of its new AI model family, surpassed OpenAI's o3-mini and o1, as well as DeepSeek's R1 in areas such as language understanding, domain knowledge, and maths and coding skills. Lennart Heim, an analyst at US think tank Rand's technology and security policy centre, expected China to match the US in terms of AI model capabilities, which would raise concerns about America losing some of its technological edge, he wrote in a piece published in the Substack newsletter ChinaTalk. Still, the US maintains ownership of 'more advanced AI chips' on account of tightened export controls, according to Heim. Meanwhile, AI chip suppliers from Nvidia to Advanced Micro Devices have made support adjustments for Qwen3. Shanghai-based AI chip start-up Biren Technology on Wednesday said its products started to support Qwen3 models 'within hours' after Alibaba's launch. The current state of China's AI model development marks a big difference from the time OpenAI introduced ChatGPT to the world on November 30, 2022, according to the State of AI: China Report , published by independent analytics firm Artificial Analysis. After DeepSeek reset the narrative with the consecutive releases of its V3 and R1 models in late December and January, several Chinese enterprises – from Big Tech companies Baidu, ByteDance and Tencent Holdings to start-ups Moonshot and MiniMax – have achieved so-called frontier-level model capabilities, according to the report. Other than China and the US, no other countries have showed similar frontier-class model training, the report said. Meanwhile, Chinese engineers are making notable progress in optimising data and algorithmic techniques to develop AI models that are, arguably, on par with those produced by top US engineers, according to Ray Wang, a Washington-based analyst focused on US-China tech competition as well as the AI and semiconductor industries in Asia. – South China Morning Post

Mid East Info
10-04-2025
- Business
- Mid East Info
Alibaba Cloud Releases Qwen2.5-Omni-7B: An End-to-end Multimodal AI Model
Alibaba Cloud has launched Qwen2.5-Omni-7B, a unified end-to-end multimodal model in the Qwen series. Uniquely designed for comprehensive multimodal perception, it can process diverse inputs, including text, images, audio, and videos, while generating real-time text and natural speech responses. This sets a new standard for optimal deployable multimodal AI for edge devices like mobile phones and laptops. Despite its compact 7B-parameter design, Qwen2.5-Omni-7B delivers uncompromised performance and powerful multimodal capabilities. This unique combination makes it the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value – especially intelligent voice applications. For example, the model could be leveraged to transform lives by helping visually impaired users navigate environments through real-time audio descriptions, offer step-by-step cooking guidance by analyzing video ingredients, or power intelligent customer service dialogues that really understand customer needs. The model is now open-sourced on Hugging Face and GitHub, with additional access via Qwen Chat and Alibaba Cloud's open-source community ModelScope. Over the past years, Alibaba Cloud has made over 200 generative AI models open-source. High Performance Driven by Innovative Architecture: Qwen2.5-Omni-7B delivers remarkable performance across all modalities, rivaling specialized single-modality models of comparable size. Notably, it sets a new benchmark in real-time voice interaction, natural and robust speech generation, and end-to-end speech instruction following. Its efficiency and high performance stem from its innovative architecture, including Thinker-Talker Architecture, which separates text generation (through Thinker) and speech synthesis (through Talker) to minimize interference among different modalities for high-quality output; TMRoPE (Time-aligned Multimodal RoPE), a position embedding technique to better synchronize the video inputs with audio for coherent content generation; and Block-wise Streaming Processing, which enables low-latency audio responses for seamless voice interactions. Outstanding Performance Despite Compact Size: Qwen2.5-Omni-7B was pre-trained on a vast, diverse dataset, including image-text, video-text, video-audio, audio-text, and text data, ensuring robust performance across tasks. With the innovative architecture and high-quality pre-trained dataset, the model excels in following voice command, achieving performance levels comparable to pure text input. For tasks that involve integrating multiple modalities, such as those evaluated in OmniBench – a benchmark that assesses models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs – Qwen2.5-Omni achieves state-of-the-art performance. Qwen2.5-Omni-7B also demonstrates high performance on robust speech understanding and generation capabilities through in-context learning (ICL). Additionally, after reinforcement learning (RL) optimization, Qwen2.5-Omni-7B showed significant improvements in generation stability, with marked reductions in attention misalignment, pronunciation errors, and inappropriate pauses during speech response.


South China Morning Post
27-03-2025
- Business
- South China Morning Post
Alibaba launches AI model that can process images and video on phones and laptops
Alibaba Group Holding has introduced a new multimodal artificial intelligence (AI) model capable of processing text, images, audio and video on smartphones and laptops, as the tech giant moves to solidify its advantages in generative AI Advertisement The company launched Qwen2.5-Omni-7B on Thursday as the latest addition to its Qwen family of models. With just 7 billion parameters, it is designed to run on mobile phones, tablets and laptops, making advanced AI capabilities more accessible to everyday users. The model can handle various types of inputs and generate real-time responses as text or audio, Alibaba said in a statement. The company made the model open-source, and it is currently available on Hugging Face, Microsoft 's GitHub, and Alibaba's ModelScope. It is also integrated into the company's Qwen Chat. Alibaba owns the Post. The company highlighted potential use cases such as assisting visually impaired users with real-time audio descriptions and providing step-by-step cooking guidance by analysing ingredients. The model's versatility underscores the growing demand for AI systems that go beyond text generation. Alibaba's foundational Qwen models have emerged as popular options for AI developers to build upon, making it one of the few major alternatives to DeepSeek's V3 and R1 models in mainland China. Advertisement Qwen2.5-Omni-7B has demonstrated strong performance in benchmark tests. It scored 56.1 on OmniBench, surpassing the 42.9 achieved by Google's Gemini-1.5-Pro. It also outperformed Alibaba's earlier Qwen2-Audio model in the CV15 audio benchmark, scoring one point higher with 92.4. For image-related tasks, it achieved 59.2 on the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark, beating the Qwen2.5-VL vision-language model.
Yahoo
26-02-2025
- Business
- Yahoo
Alibaba offers free access to its AI model that can generate realistic video and images
Alibaba is giving people free access to its generative artificial intelligence models that can produce highly realistic videos and images from both text and image input. The company has announced that four variants of its Wan 2.1 series, the latest version of its generative AI technology, are now open source and can be downloaded and modified by users. Researchers, academics and commercial entities can all get them from Alibaba Cloud's ModelScope and Hugging Face platforms, both of which give people access to open-source AI models. As Reuters said, the models Alibaba has open sourced are called T2V-1.3B, T2V-14B, I2V-14B-720P and I2V-14B-480P, with the 14B indicating that the model can accept 14 billion parameters. Last month, Chinese company DeepSeek made its R1 reasoning model free to download and use, creating a clamor for more open-source AI technologies. DeepSeek even expanded its commitment to the open-source community and is in the process of releasing five code repositories behind its service. Alibaba was one of the companies that joined the fray to develop generative AI tech following the launch of OpenAI's ChatGPT two years ago. Just recently, Alibaba Group's Chairman, Joe Tsai, said that the company's generative AI technology will power artificial intelligence features for iPhones meant for sale in the Chinese market. Apple couldn't use the same AI tech for phones released in China due to strict regulations surrounding AI products, so it has to look for local partners, Alibaba being one of them.

Al Arabiya
26-02-2025
- Business
- Al Arabiya
Alibaba makes AI model for video, image generation publicly available
Chinese e-commerce leader Alibaba said on Wednesday its video- and image-generating artificial intelligence model, Wan 2.1, is now publicly available—or open source—in a move likely to increase its uptake and intensify competition in AI. Alibaba's announcement follows similar action from startup DeepSeek, whose ostensibly low-cost open-source models earlier this year generated excitement among technology investors and surprise in the capital-intensive sector with performance akin to those of more established rivals such as OpenAI. Alibaba said it has released four variants of Wan 2.1—T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P—which generate images and videos from text and image input. The '14B' indicates the variant accepts 14 billion parameters, meaning it can process far more input to yield more accurate results. The models are available globally on Alibaba Cloud's ModelScope and Hugging Face platforms for academic, research, and commercial use. Alibaba introduced the latest version of its video- and image-generating AI model in January—later shortening its name to Wan from Wanx—touting its ability to generate highly realistic visuals. The firm has since highlighted its top ranking on VBench, a leaderboard for video generative models, where it leads in functionality such as multi-object interaction. On Tuesday, Alibaba released a preview of reasoning model QwQ-Max, which it plans to make open source upon full release. It also announced plans this week to invest at least 380 billion yuan ($52 billion) over the next three years to bolster cloud computing and AI infrastructure.