Latest news with #Qwen2.5-72B-Instruct


India Today
29-04-2025
- Business
- India Today
Alibaba launches Qwen3 AI, again challenges ChatGPT and Google Gemini
The Chinese tech company behind Aliexpress, Alibaba, announced on Monday that it has launched a new family of AI models called Qwen3, which in some cases is better than OpenAI's ChatGPT and Google's Gemini AI models. The company shared a long post on X revealing its new AI models. 'We are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro,' the company wrote in an official blog post. advertisementAlibaba says Qwen3 AI models support 119 languages including Hindi, Gujarati, Marathi, Chhattisgarhi, Awadhi, Maithili, Bhojpuri, Sindhi, Punjabi, Bengali, Oriya, Magahi and Urdu. Qwen3 features eight models ranging from 0.6B to 235B parameters. These include both dense and Mixture of Experts (MoE) architectures, designed to cater to various performance and efficiency needs. The top-performing model, Qwen3-235B-A22B, according to Alibaba, delivers strong results across key benchmarks such as math, coding, and general reasoning. 'The small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct,' the company claims. The compact Qwen3-4B rivals the much larger Qwen2.5-72B-Instruct. Screenshot: Qwen blog post advertisement'We are open-weighting two MoE models: Qwen3-235B-A22B, a large model with 235 billion total parameters and 22 billion activated parameters, and Qwen3-30B-A3B, a smaller MoE model with 30 billion total parameters and 3 billion activated parameters. Additionally, six dense models are also open-weighted, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, under Apache 2.0 license,' the company writes in its blog models are available on Hugging Face, ModelScope, and Kaggle, with both pre-trained and post-trained versions (e.g., Qwen3-30B-A3B and its base variant). For deployment, Alibaba recommends SGLang and vLLM, while local use is supported through tools like Ollama, LMStudio, MLX, and KTransformers. Alibaba says the Qwen3 models offer scalable performance, which means that they can adjust quality of the response based on the compute budget, in turn enabling an optimal balance between speed, cost and capability. They're especially well-suited for coding tasks and agent-based interactions, with improved multi-step reasoning. Alibaba says the Qwen3 models also come with something called hybrid thinking. There is a thinking mode, which processes information step-by-step, taking the time to deliberate before delivering a final answer. Then there is a non-thinking mode which allows the model to generate immediate responses, prioritising speed over depth. This dual-mode system gives users control over the depth of computation depending on the task. 'This flexibility allows users to control how much 'thinking' the model performs based on the task at hand,' says Alibaba. 'This design enables users to configure task-specific budgets with greater ease, achieving a more optimal balance between cost efficiency and inference quality.'


South China Morning Post
25-03-2025
- Business
- South China Morning Post
Ant Group's use of China-made GPUs, not Nvidia, cuts AI model training costs by 20%
Ant Group , the fintech affiliate of Alibaba Group Holding , is able to train large language models (LLMs) using locally produced graphics processing units (GPUs), reducing reliance on Nvidia's advanced chips and cutting training costs by 20 per cent, according to a research paper and media reports. Advertisement Ant's Ling team, responsible for LLM development, revealed that its Ling-Plus-Base model, a Mixture-of-Experts (MoE) model with 300 billion parameters, can be 'effectively trained on lower-performance devices'. The finding was published in a recent paper on arXiv, an open-access platform for professionals in the scientific community. By avoiding high-performance GPUs, the model reduces computing costs by a fifth in the pre-training process, while still achieving performance comparable to other models such as Qwen2.5-72B-Instruct and DeepSeek-V2.5-1210-Chat, according to the paper. The development positions the Hangzhou-based fintech giant alongside domestic peers like DeepSeek and ByteDance in reducing reliance on advanced Nvidia chips, which are subject to strict US export controls. 'These results demonstrate the feasibility of training state-of-the-art large-scale MoE models on less powerful hardware, enabling a more flexible and cost-effective approach to foundational model development with respect to computing resource selection,' the team wrote in the paper. Advertisement