Latest news with #Qwen3-4B


India Today
29-04-2025
- Business
- India Today
Alibaba launches Qwen3 AI, again challenges ChatGPT and Google Gemini
The Chinese tech company behind Aliexpress, Alibaba, announced on Monday that it has launched a new family of AI models called Qwen3, which in some cases is better than OpenAI's ChatGPT and Google's Gemini AI models. The company shared a long post on X revealing its new AI models. 'We are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro,' the company wrote in an official blog post. advertisementAlibaba says Qwen3 AI models support 119 languages including Hindi, Gujarati, Marathi, Chhattisgarhi, Awadhi, Maithili, Bhojpuri, Sindhi, Punjabi, Bengali, Oriya, Magahi and Urdu. Qwen3 features eight models ranging from 0.6B to 235B parameters. These include both dense and Mixture of Experts (MoE) architectures, designed to cater to various performance and efficiency needs. The top-performing model, Qwen3-235B-A22B, according to Alibaba, delivers strong results across key benchmarks such as math, coding, and general reasoning. 'The small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct,' the company claims. The compact Qwen3-4B rivals the much larger Qwen2.5-72B-Instruct. Screenshot: Qwen blog post advertisement'We are open-weighting two MoE models: Qwen3-235B-A22B, a large model with 235 billion total parameters and 22 billion activated parameters, and Qwen3-30B-A3B, a smaller MoE model with 30 billion total parameters and 3 billion activated parameters. Additionally, six dense models are also open-weighted, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, under Apache 2.0 license,' the company writes in its blog models are available on Hugging Face, ModelScope, and Kaggle, with both pre-trained and post-trained versions (e.g., Qwen3-30B-A3B and its base variant). For deployment, Alibaba recommends SGLang and vLLM, while local use is supported through tools like Ollama, LMStudio, MLX, and KTransformers. Alibaba says the Qwen3 models offer scalable performance, which means that they can adjust quality of the response based on the compute budget, in turn enabling an optimal balance between speed, cost and capability. They're especially well-suited for coding tasks and agent-based interactions, with improved multi-step reasoning. Alibaba says the Qwen3 models also come with something called hybrid thinking. There is a thinking mode, which processes information step-by-step, taking the time to deliberate before delivering a final answer. Then there is a non-thinking mode which allows the model to generate immediate responses, prioritising speed over depth. This dual-mode system gives users control over the depth of computation depending on the task. 'This flexibility allows users to control how much 'thinking' the model performs based on the task at hand,' says Alibaba. 'This design enables users to configure task-specific budgets with greater ease, achieving a more optimal balance between cost efficiency and inference quality.'
&w=3840&q=100)

Business Standard
29-04-2025
- Business
- Business Standard
Alibaba launches Qwen3 AI, claims it's better than DeepSeek R1: Details
Alibaba Group Holding unveiled the third generation of its open-source artificial intelligence (AI) model Qwen3 series, on Tuesday, raising the stakes in an increasingly competitive Chinese and global AI market. The Qwen3 family boasts faster processing speeds and expanded multilingual capabilities compared to other AI models, including DeepSeek-R1 and OpenAI's o1. What is the Qwen3 series? The Qwen3 range features eight models, varying from 600 million to 235 billion parameters, each offering performance improvements, according to Alibaba's cloud computing division. Parameters, often seen as a measure of an AI model's complexity and capability, are essential for tasks such as language understanding, coding, and mathematical problem-solving. How do Qwen3 models compare to rivals? According to benchmark tests, cited by the developors, the Qwen3-235B and Qwen3-4B models either matched or outperformed advanced competitors from both Chinese and international companies — including OpenAI's o1, Google's Gemini, and DeepSeek's R1 — particularly in instruction following, coding support, text generation, mathematical problem-solving, and complex reasoning. "Qwen3 represents a significant milestone in our journey towards artificial general intelligence and artificial superintelligence," the Qwen team added, highlighting that enhanced pre-training and reinforcement learning had resulted in a marked leap in the models' intelligence. "Notably, our smaller MoE model, Qwen3-30B-A3B, surpasses QwQ-32B, and even the compact Qwen3-4B rivals the performance of the much larger Qwen2.5-72B-Instruct," the company added in a blog post on the launch. ALSO READ | Qwen3 introduces 'hybrid reasoning' capability One of the standout features of the Qwen3 series is its hybrid reasoning capability. Users can select between a slower but deeper "thinking" mode for complex tasks and a faster "non-thinking" mode for quicker, simpler responses. This flexibility aims to cater to diverse user needs, from casual interactions to advanced problem-solving. In contrast, DeepSeek-R1 primarily uses Chain-of-Thought (CoT) reasoning, a method where the model generates a sequence of thought steps or reasoning processes before providing a final answer. Training for the Qwen3 models involved 36 trillion tokens across 119 languages and dialects, tripling the language scope achieved by its predecessor, Qwen2.5. This expansion is expected to significantly enhance the models' ability to understand and generate multilingual content. Where and how to use Qwen3? The new Qwen3 models are available for download on platforms such as Hugging Face, ModelScope, Kaggle, and Microsoft's GitHub. Alibaba recommends deployment using frameworks like SGLang and vLLM, while users who prefer local integration can turn to tools such as Ollama, LMStudio, MLX, and KTransformers. Global AI race The release of Qwen3 arrives at a time when the global AI landscape is witnessing a surge of new developments. Baidu recently unveiled two upgraded models, and DeepSeek's R2 launch is also anticipated soon.


South China Morning Post
29-04-2025
- Business
- South China Morning Post
Alibaba unveils Qwen3 AI models that it says outperform DeepSeek R1
Advertisement The Qwen3 family consists of eight models, ranging from 600 million parameters to 235 billion, with enhancements across all models, according to the Qwen team at Alibaba's cloud computing unit. Alibaba owns the South China Morning Post. In AI, parameters are a measurement of the variables present during model training. They serve as an indicator of sophistication: larger parameter sizes typically suggest greater capacity. Benchmark tests cited by Alibaba revealed that models such as Qwen3-235B and Qwen3-4B matched or exceeded the performance of advanced models from both domestic and overseas competitors – including OpenAI's o1, Google's Gemini and DeepSeek's R1 – in areas like instruction following, coding assistance, text generation, mathematical skills and complex problem solving. 11:13 How is betting on AI to transform e-commerce How is betting on AI to transform e-commerce The launch of Qwen3, which was anticipated this month as previously reported by the Post , is expected to solidify Alibaba's position as a leading provider of open-source models. With over 100,000 derivative models built upon it, Qwen is currently the world's largest open-source AI ecosystem, surpassing Facebook parent Meta Platforms' Llama community. Advertisement