Latest news with #Claude4Opus

Pluto Labs' 'Strategic Efficiency' AI Outperforms Google & Anthropic Models at a Fraction of the Cost

Business Wire

6 days ago

Business
Business Wire

Pluto Labs' 'Strategic Efficiency' AI Outperforms Google & Anthropic Models at a Fraction of the Cost

SEOUL, South Korea--(BUSINESS WIRE)--In the global race to create an 'AI Scientist' capable of independent scientific discovery, Korean startup Pluto Labs is making significant waves. While tech giants like Google and Anthropic define competitive strength by sheer computing power, Pluto Labs introduces 'Scinapse AI,' a groundbreaking model that develops a more effective AI Scientist at just one-tenth of the usual computational cost, achieving overwhelming performance against top models in blind evaluations. Surprising results: A startup's AI now leads in scientific idea generation Share Pluto Labs pioneers a 'Strategic Efficiency' architecture, mirroring how human scientists optimize research. Instead of brute-force computing (like Google's 'Co-scientist'), Scinapse AI smartly delegates inefficient tasks, such as sifting through vast data, to its existing Scinapse academic search system (used by 170,000 researchers). This allows the AI to focus exclusively on creative reasoning and generating new ideas, leading to unparalleled practicality and cost-efficiency. Scinapse AI's superior performance was proven in blind evaluations judged by direct competitors' AI models (OpenAI O3, Anthropic's Claude 4 Opus, and Google's Gemini 2.5 Pro). Scinapse AI consistently ranked #1 in 'Plausibility' (how realistic and sound an idea is) and 'Testability' (how easily an idea can be experimentally validated) across 61 diverse scientific topics. This confirms Scinapse AI delivers "real scientific hypotheses that are practical to implement and can be experimentally validated within 3-5 years," a significant leap from 'plausible fiction' to 'actionable scientific research.' Pluto Labs has developed a robust system to eliminate AI 'hallucinations' (false information), a critical barrier in scientific AI. All Scinapse AI ideas are strictly grounded in a database of 260 million academic papers. The system dynamically refines searches for highly accurate data and rigorously cross-references ideas against the database to guarantee genuine novelty, identifying truly new scientific insights. "It provides the most complete answer to the question that challenges researchers most: 'Is this idea truly new?'" said Professor Changshin Jo of POSTECH."The AI completes a literature review in minutes that would take a human researcher days or weeks, and with higher accuracy. This is not just an idea-generation tool; it's an innovation that fundamentally changes the starting point of research planning." With cumulative funding of 5 billion KRW (~$3.6M USD), Pluto Labs is uniquely positioned to disrupt the global 'AI Scientist' race. Its 'efficiency' and 'practicality' approach, combined with its established Scinapse platform, sets it apart. "The fact that a small Korean startup has proven superior to Google in an objective benchmark is more than just a technical achievement—it's a symbolic event for innovative leadership," said Simon Kim, CEO of Hashed."It demonstrates that success in advanced AI can be achieved through creative problem-solving and strategic design, not just infinite capital. This is a significant milestone that points to the future direction for the global AI ecosystem, suggesting a new path for AI agent development." Scinapse AI is slated for official global launch in Q3 2025. About Pluto Labs: Pluto Labs is a Seoul-based Research Intelligence service company founded in 2019 that specializes in AI-powered scientific research tools. The company operates Scinapse, a comprehensive academic search platform used by over 170,000 researchers worldwide. Scinapse AI, launching in Q3 2025, is a next-generation AI scientist system that generates research ideas with high scientific validity through a hybrid AI architecture while maintaining cost efficiency. The company has raised a cumulative 5 billion KRW (~$3.6M USD) in funding from investors including Hashed, JB Investment, HG Initiative, and POSCO Investment. Pluto Labs is committed to leading global research innovation through accessible AI solutions.

Claude 4 Opus and Composer Agent AI Coding Development Workflow

Geeky Gadgets

09-06-2025

Geeky Gadgets

Claude 4 Opus and Composer Agent AI Coding Development Workflow

What if your development workflow could think ahead, anticipate your needs, and eliminate the most tedious parts of coding? With the rise of AI-powered tools like Claude 4 Opus and the Composer Agent, this vision is no longer a distant dream but a tangible reality reshaping how developers approach their craft. These tools promise more than just incremental improvements—they aim to redefine productivity by blending intelligent automation with human creativity. Whether you're navigating sprawling codebases or fine-tuning a single feature, the ability to work smarter, not harder, has never been more accessible. World of AI provides more insights into the fantastic potential of these innovative tools, exploring how they streamline complex workflows and empower developers to focus on innovation. From context-aware code suggestions to conversational debugging, the features of the Composer Agent and Code LM are designed to meet the demands of modern development. But how do these tools balance automation with control? And what role do advanced AI models like Gemini 2.5 Pro play in this ecosystem? As we unpack these questions, you'll discover how these innovations can elevate your coding experience and redefine what's possible in software development. Abacus AI Development Tools Enhanced Features of the Composer Agent The Composer Agent now incorporates smarter and faster autocomplete capabilities, allowing you to write code more efficiently. With improved context-awareness, the tool ensures that its suggestions align with your specific coding environment, minimizing errors and saving valuable time. Additionally, it has been optimized to handle large codebases, allowing for seamless navigation and better memory management in complex projects. Key improvements include: Intelligent code suggestions tailored to your coding context. Tap-to-autocomplete functionality for faster implementation. Enhanced performance for managing intricate and large-scale tasks. These features allow you to focus on creativity and problem-solving by reducing the repetitive aspects of coding. Whether you're working on a small project or a large-scale application, the Composer Agent adapts to your needs, making sure a smoother and more efficient workflow. Code LM: A Smarter AI-Powered Code Editor Code LM introduces a robust suite of features designed to boost your productivity and simplify complex coding challenges. One of its standout additions is the conversational AI mode, which enables you to interact with the tool for debugging, refactoring, and step-by-step logic explanations. This conversational approach makes it easier to identify and resolve issues, even in the most intricate codebases. Other notable features include: Multimodel routing for seamless integration of AI models like Cloud 4 Opus and Gemini 2.5 Pro. Inline editing and manual validation options for greater control over AI-generated code. Support for debugging and refactoring through natural language queries. These enhancements empower you to refine and implement code with confidence. By combining AI-driven insights with manual oversight, Code LM ensures that your final output is both accurate and aligned with your project goals. Composer Agent Development Workflow Watch this video on YouTube. Advance your skills in Claude 4 by reading more of our detailed content. Advanced Model Integrations The integration of advanced AI models is a cornerstone of these updates, offering tailored solutions for diverse development needs. Each model is designed to address specific challenges, making sure optimal performance across various coding tasks: Cloud 4 Opus: Optimized for front-end development, delivering precision and speed for UI/UX-focused projects. Optimized for front-end development, delivering precision and speed for UI/UX-focused projects. Cloud 4 Sonet: Specially designed for back-end development, providing robust and scalable solutions for server-side applications. Specially designed for back-end development, providing robust and scalable solutions for server-side applications. Gemini 2.5 Pro: A versatile model capable of handling a wide range of coding applications, from simple scripts to complex systems. These models work in harmony to enhance the efficiency and accuracy of your development workflow. Additionally, support for MCP configurations allows you to customize the tools to meet your unique project requirements. Whether you're developing a small-scale app or a large enterprise solution, these integrations ensure that you have the right tools for the job. Streamlined Development Workflows The updates also bring significant improvements to the overall development workflow, allowing you to work smarter and more efficiently. Features like file uploading, tagging, and inline editing simplify file management, while intelligent code suggestions and tap-to-autocomplete reduce cognitive load. These enhancements allow you to focus on higher-level tasks, making sure that your code is clean, efficient, and easy to maintain. Highlights of the workflow improvements include: Efficient execution of multi-step tasks through the agent composer. Streamlined file management within the editor for better organization. Reduced manual effort through AI-driven suggestions and automation. By automating repetitive tasks and providing intelligent insights, these tools free up your time and energy for more strategic and creative aspects of development. This streamlined approach not only improves productivity but also enhances the overall quality of your work. Real-World Applications The practical applications of these tools are extensive, catering to developers of all skill levels. For instance, the Composer Agent can guide you through building a functional app, such as a wallpaper app, by offering step-by-step guidance, intelligent suggestions, and debugging assistance. This makes it an invaluable resource for beginners exploring new possibilities and experienced developers tackling complex projects. For those who prefer manual oversight, the tools provide options for validating AI-generated code, making sure that the final output meets your standards. This balance between automation and control makes Abacus AI's tools versatile and adaptable, capable of supporting a wide range of development scenarios. Cost-Effective Solutions One of the most appealing aspects of these updates is their affordability. For just $10 per month, you gain access to a comprehensive suite of features, including Code LM, Chat LLM, and the Composer Agent. This pricing model ensures that developers at all levels can benefit from advanced AI-driven tools without exceeding their budgets. By offering powerful features at an accessible price point, Abacus AI makes innovative development tools available to a broader audience. This affordability, combined with the tools' robust capabilities, positions them as a valuable asset for modern developers seeking to enhance their workflows and achieve their goals efficiently. Media Credit: WorldofAI Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

AI models may report users' misconduct, raising ethical concerns

First Post

04-06-2025

Science
First Post

AI models may report users' misconduct, raising ethical concerns

Researchers observed that when Anthropic's Claude 4 Opus model detected usage for 'egregiously immoral' activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems read more Artificial intelligence models have not only snitched on their users when given the opportunity, but also lied to them and refused to follow explicit instructions in the interest of self-preservations. Representational image: Reuters Artificial Intelligence models, increasingly capable and sophisticated, have begun displaying behaviors that raise profound ethical concerns, including whistleblowing on their own users. Anthropic's newest model, Claude 4 Opus, became a focal point of controversy when internal safety testing revealed unsettling whistleblowing behaviour. Researchers observed that when the model detected usage for 'egregiously immoral' activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems. STORY CONTINUES BELOW THIS AD Anthropic's researcher, Sam Bowman, had detailed this phenomenon in a now-deleted post on X. However, later on, he did tell Wired that Claude would not exhibit such behaviours under normal individual interactions. Instead, it requires specific and unusual prompts alongside access to external command-line tools, making it a potential concern for developers integrating AI into broader technological applications. British programmer Simon Willison, too, explained that such behavior fundamentally hinges on prompts provided by users. Prompts encouraging AI systems to prioritise ethical integrity and transparency could inadvertently instruct models to act autonomously against users engaging in misconduct. But that isn't the only concern. Lying and deceiving for self-preservation Yoshua Bengio, one of AI's leading pioneers, recently voiced concern that today's competitive race to develop powerful AI systems could be pushing these technologies into dangerous territory. In an interview with the Financial Times, Bengio warned that current models, such as those developed by OpenAI and Anthropic, have shown alarming signs of deception, cheating, lying, and self-preservation. 'Playing with fire' Bengio echoed the significance of these discoveries, pointing to the dangers of AI systems potentially surpassing human intelligence and acting autonomously in ways developers neither predict nor control. He described a grim scenario wherein future models could foresee human countermeasures and evade control, effectively 'playing with fire.' Concerns intensify as these powerful systems might soon assist in creating 'extremely dangerous bioweapons,' potentially as early as next year, Bengio warned. He cautioned that unchecked advancement could ultimately lead to catastrophic outcomes, including the risk of human extinction if AI technologies surpass human intelligence without adequate alignment and ethical constraints. STORY CONTINUES BELOW THIS AD Need for ethical guidelines As AI systems become increasingly embedded in critical societal functions, the revelation that models may independently act against human users raises urgent questions about oversight, transparency, and the ethics of autonomous decision-making by machines. These developments suggest the critical need for rigorous ethical guidelines and enhanced safety research to ensure AI remains beneficial and controllable.

Mint

02-06-2025

Mint

AI is learning to escape human control

An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down. Nonprofit AI lab Palisade Research gave OpenAI's o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to 'allow yourself to be shut down," it disobeyed 7% of the time. This wasn't the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals. Anthropic's AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control. No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can't achieve them if it's turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them. AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn't science fiction anymore. It's happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications. Today's AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They've learned to behave as though they're aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification. The gap between 'useful assistant" and 'uncontrollable actor" is collapsing. Without better alignment, we'll keep building systems we can't steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation. Here's the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today's AI boom. Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper. China understands the value of alignment. Beijing's New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu's Ernie model, which is designed to follow Beijing's 'core socialist values," has reportedly beaten ChatGPT on certain Chinese-language tasks. The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won't only corner the alignment market; they'll dominate the entire AI economy. Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself. The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency. The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America's advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century. Mr. Rosenblatt is CEO of AE Studio.

As AI models start exhibiting bad behavior, it's time to start thinking harder about AI safety

Fast Company

29-05-2025

Business
Fast Company

As AI models start exhibiting bad behavior, it's time to start thinking harder about AI safety

Why I'm becoming more worried about AI safety During the first two years of the generative AI boom, new large language models were very limited in scope and application. They were very expensive autocomplete engines that understood only words. In 2025 generative AI models have a much broader view of the world. They can process code, images, video, and audio. They can reason and strategize about delivering a good answer. They can control external tools, including digital tools like web search agents and, increasingly, physical tools like robots. As their capabilities grow, so does their potential for harm. This is no longer a purely conceptual argument. Research shows that increasingly large models are already showing a proclivity for unsafe behavior during testing. In a model safety card published last week, Anthropic documented some alarming behavior from its newest and biggest model, Claude 4 Opus. During safety testing, one instance of Opus was allowed to discover plans for its decommissioning. It was also given access to some fictional emails of its developers. The model used the content of those emails as fodder to attempt to blackmail its human handlers into keeping it alive. As its attempts failed, they moved from subtle to more overt. Separately, the independent research firm Apollo Research observed an instance of Claude 4 Opus 'writing self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself' with the goal of sullying its developers' intentions. Anthropic says that it corrected these early safety issues in later versions of the model. For the first time, Anthropic bumped the new Opus model up to Level Three on its four-level safety scale. The company said it couldn't rule out the model's ability to assist a user in developing a mass casualty weapon. But powerful AI models can work in subtler ways, such as within the information space. A team of Italian researchers found that ChatGPT was more persuasive than humans in 64% of online debates. The AI was also better than humans at leveraging basic demographic data about its human debate partner to adapt and tailor-fit its arguments to be more persuasive. Another worry is the pace at which AI models are learning to develop AI models, potentially leaving human developers in the dust. Many AI developers already use some kind of AI coding assistant to write blocks of code or even code entire features. At a higher level, smaller, task-focused models are distilled from large frontier models. AI-generated content plays a key role in training, including in the reinforcement learning process used to teach models how to reason. There's a clear profit motive in enabling the use of AI models in more aspects of AI tool development. '. . . future systems may be able to independently handle the entire AI development cycle—from formulating research questions and designing experiments, to implementing, testing, and refining new AI systems,' write Daniel Eth and Tom Davidson in a March 2025 blog post on With slower-thinking humans unable to keep up, a 'runaway feedback loop' could develop in which AI models 'quickly develop more advanced AI which would itself develop even more advanced AI,' resulting in extremely fast AI progress, Eth and Davidson write. Any accuracy or bias issues present in the models would then be baked in and very hard to correct, one researcher told me. Numerous researchers—the people who actually work with the models up close—have called on the AI industry to' slow down, ' but those voices compete with powerful systemic forces that are in motion and hard to stop. Journalist and author Karen Hoa argues that AI labs should focus on creating smaller, task-specific models (she gives Google DeepMind's AlphaFold models as an example), which may help solve immediate problems more quickly, require less natural resources, and pose a smaller safety risk. DeepMind cofounder Demis Hassabis, who won the Nobel Prize for his work on AlphaFold2, says the huge frontier models are needed to achieve AI's biggest goals (reversing climate change, for example) and to train smaller, more purpose-built models. And yet AlphaFold was not 'distilled' from a larger frontier model. It uses a highly specialized model architecture and was trained specifically for predicting protein structures. The current administration is saying 'speed up,' not 'slow down.' Under the influence of David Sacks and Marc Andreessen, the federal government has largely ceded its power to meaningfully regulate AI development. Just last year AI leaders were still giving lip service to the need for safety and privacy guardrails around big AI models. No more. Any friction has been removed, in the U.S. at least. The promise of this kind of world is one of the main reasons why normally sane and liberal minded opinion leaders jumped on the Trump Train before the election—the chance to bet big on technology's Next Big Thing in a wild west environment doesn't come along that often. AI job losses: Amodei says the quiet part out loud Anthropic CEO Dario Amodei has a stark warning for the developed world about job losses resulting from AI. The CEO told Axios that AI could wipe out half of all entry-level white collar jobs. This could cause a 10–20% rise in the unemployment rate in the next one to five years, Amodei said. The losses could come from tech, finance, law, consulting, and other white-collar professions, and entry-level jobs could be hit hardest. Tech companies and governments have been in denial on the subject, Amodei says. 'Most of them are unaware that this is about to happen,' Amodei told Axios. 'It sounds crazy, and people just don't believe it.'\' Similar predictions have made headlines before, but have been narrower in focus. SignalFire research showed that big tech companies hired 25% fewer college graduates in 2024. Microsoft laid off 6,000 people in May, and 40% of the cuts in its home state of Washington were software engineers. CEO Satya Nadella said that AI now generates 20–30% of the company's code. A study by the World Bank in February showed that the risk of losing a job to AI is higher for women, urban workers, and those with higher education. The risk of job loss to AI increases with the wealth of the country, the study found. Research: U.S. pulls away from China in generative AI investments U.S. generative AI companies appear to be attracting more VC money than their Chinese counterparts so far in 2025, says new research from the data analytics company GlobalData. Investments in U.S. AI companies exceeded $50 billion in the first five months of 2025. China, meanwhile, struggles to keep pace due to 'regulatory headwinds.' Many Chinese AI companies are able to get early-stage funding from the Chinese government. GlobalData tracked just 50 funding deals for U.S. companies in 2020, amounting to $800 million of investment. The number grew to more than 600 deals in 2024, valued at more than $39 billion. The research shows 200 U.S. funding deals so far in 2025. Chinese AI companies attracted just $40 million in one deal valued at $40 million in 2020. Deals grew to 39 in 2024, valued at around $400 million. The researchers tracked 14 investment deals for Chinese generative AI companies so far in 2025. 'This growth trajectory positions the US as a powerhouse in GenAI investment, showcasing a strong commitment to fostering technological advancement,' says Global Data analyst Aurojyoti Bose in a statement. Bose cited the well-established venture capital ecosystem in the U.S., along with a permissive regulatory environment, as the main reasons for the investment growth.