logo
#

Latest news with #Opus

Materialists Trailer: Love Gets Complicated With Chris Evans And Dakota Johnson
Materialists Trailer: Love Gets Complicated With Chris Evans And Dakota Johnson

News18

timea day ago

  • Entertainment
  • News18

Materialists Trailer: Love Gets Complicated With Chris Evans And Dakota Johnson

Last Updated: Materialists stars Chris Evans, Pedro Pascal and Dakota Johnson in a love triangle balancing stability and chaos between her past and present. Romance takes a modern, complicated turn in Materialists, a new romantic comedy set in the heart of New York City. Starring Chris Evans, Pedro Pascal and Dakota Johnson, the film explores a stylish love triangle under the direction of Celine Song, whose debut feature Past Lives received critical acclaim and an Oscar nomination. In Materialists, Dakota Johnson steps into the role of Lucy, a driven and high-powered matchmaker who has built a thriving career connecting others—but whose own romantic life is far less orderly. Her world begins to spiral when, during a chic Manhattan soiree, she crosses paths with Harry (Pedro Pascal), a charming and wealthy tech billionaire with all the makings of a perfect partner. The same night throws her for an emotional loop when she unexpectedly bumps into her ex-boyfriend John (Chris Evans), a passionate but struggling actor-slash-waiter, whose imperfections and past heartbreaks still linger in her mind. According to the film's official synopsis, Lucy must navigate the emotional tug-of-war between her stable, aspirational present and the messy, unresolved past she thought she left behind. The story cleverly weaves humour, vulnerability and sharp introspection as Lucy contemplates whether true love lies in what looks good on paper or in something far more chaotic and unpredictable. The newly released trailer offers glimpses into this tangled love triangle, filled with biting one-liners, lingering stares and moments of romantic tension. One pivotal scene shows John confronting Lucy about her relationship with Harry, asking whether she truly envisions a life with him, adding layers of doubt to what initially seemed like a fairy tale match. With its sparkling dialogue and stylish visuals, Materialists is shaping up to be one of the most buzzworthy rom-coms of the year. The film also boasts a stellar supporting cast, including Marin Ireland, Louisa Jacobson, Zoe Winters, Sawyer Spielberg and Dasha Nekrasova, each bringing depth to a vibrant ensemble. As her second directorial outing, Materialists sees Celine Song blending the emotional resonance of Past Lives with a lighter, more comedic tone, proving her storytelling versatility while still exploring themes of identity, love and self-worth. Distributed by A24, Materialists joins a slate of exciting new releases from the indie powerhouse. Other anticipated titles on their calendar include On Becoming a Guinea Fowl (released March 7), the surreal drama Opus starring Ayo Edebiri and John Malkovich (March 14), and the fantastical Death of a Unicorn featuring Jenna Ortega and Paul Rudd (March 28). Set to hit theatres on June 13, Materialists is primed to be a summer hit—blending aesthetics, smart storytelling and star power into a must-watch cinematic romance. First Published:

Anthropic CEO claims AI will cause mass unemployment in the next 5 years — here's why
Anthropic CEO claims AI will cause mass unemployment in the next 5 years — here's why

Tom's Guide

timea day ago

  • Business
  • Tom's Guide

Anthropic CEO claims AI will cause mass unemployment in the next 5 years — here's why

In recent months, multiple companies have taken strong stances on choosing AI over new employees, signalling a major change in the job market. And, according to one of AI's biggest CEOs, things are only going to get worse. In an interview with CNN's Anderson Cooper, Anthropic CEO Dario Amodei said, 'AI is starting to get better than humans at almost all intellectual tasks, and we're going to collectively, as a society, grapple with it.' 'AI is going to get better at what everyone does, including what I do, including what other CEOs do.' Anthropic is the company behind Claude — one of the biggest and most popular AI models in the world right now. The company recently launched its latest version of the system, known as Claude 4 Sonnet and Opus. Our own testing (and comparisons against ChatGPT) convinced us Anthropic's newest model is one of the best AI systems to date. In a separate interview with Axios, Amodei explained his beliefs that AI tools could eliminate half of entry-level white collar jobs and boost unemployment to as much as 20% within the next five years. Experts and researchers have been telling us this for years now, so why is this any different? As the CEO of Anthropic, Amodei is right in the eye of the storm. While AI has already proved its abilities in creative formats like writing, as well as image and video generation, it's the next frontier that is concerning. Meta CEO Mark Zuckerberg has stated that he wants AI to do half of Meta's coding by 2026 and Microsoft's CEO Satya Nadella said as much as 30% of his company's code is currently being completed by AI. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. This is all part of AI's latest party trick. Across all of the major AI models, the ability to deal with code has grown exponentially. Not only can these models code based purely on prompts, but for those more experienced in programming, it can check through their work, drop in pre-made blocks and take on time-intensive tasks like debugging. This could render a large number of jobs in the coding industry obsolete, but also shows a movement of AI into complicated thought patterns, able to complete multiple steps in a tasks. During his interview, Amodei said Anthropic tracks the number of people who say they use its AI models to build on human jobs versus those entirely automating those jobs. This is something that has before held the system back from taking on more jobs, only able to complete tasks within the confines of a chatbot or generator. During his interview, Amodei said Anthropic tracks the number of people who say they use its AI models to build on human jobs versus those entirely automating those jobs. Currently, it's about 60% of people using AI for augmentation and 40% for automation. However, that replacement number is growing and it is a trend being seen in some of the largest companies like Shopify and Duolingo. With artificial intelligence tools expanding faster than regulators can move, it's highly likely this will become an ever-increasing topic for society to grapple with. In the midst of all of it, Amodei's advice for the average person is what you'd expect: learn to use AI.

7 AI Coding Models Tested Using the Same Prompt : Winners, Losers and Surprises
7 AI Coding Models Tested Using the Same Prompt : Winners, Losers and Surprises

Geeky Gadgets

time3 days ago

  • Business
  • Geeky Gadgets

7 AI Coding Models Tested Using the Same Prompt : Winners, Losers and Surprises

What if a single prompt could reveal the true capabilities of today's leading coding language models (LLMs)? Imagine asking seven advanced AI systems to tackle the same complex task—building a functional web app that synthesizes real-time data into a structured dashboard—and comparing their performance side by side. The results might surprise you. From unexpected strengths to glaring weaknesses, these models don't just code; they reveal how far AI has come and where it still stumbles. With costs ranging from $15 to $75 per million tokens, the stakes are high for developers choosing the right tool for their workflows. So, which models shine, and which falter under pressure? In the video below Prompt Engineering show how seven prominent LLMs—like Opus 4, Gemini 2.5 Pro, and Sonnet 3.7—stacked up when tested with identical prompts. You'll discover which models excelled at handling multi-step processes and which struggled with accuracy and hallucination issues. Whether you're a developer seeking cost-efficient solutions or a technical lead evaluating tools for complex projects, these findings offer actionable insights to help you make informed decisions. By the end, you might rethink how you approach AI-driven coding and whether a single model can truly meet all your needs—or if the future lies in combining their strengths. Comparing Coding LLM Performance Tested Models and Evaluation Criteria The study examined the performance of seven models: Sonnet 4, Sonnet 3.7, Opus 4, Gemini 2.5 Pro, Quinn 2.5 Max, DeepSeek R1, and O3. Each model was tasked with creating a functional web app while demonstrating effective tool usage and avoiding hallucinated outputs. Gro 3 was excluded from the evaluation due to incompatibility with the prompt. The evaluation focused on four critical areas to gauge the models' effectiveness: Information Synthesis: The ability to gather and integrate data from web searches. The ability to gather and integrate data from web searches. Dashboard Accuracy: The precision in rendering structured dashboards. The precision in rendering structured dashboards. Sequential Tool Usage: Effectiveness in managing multi-step processes. Effectiveness in managing multi-step processes. Error Minimization: Reducing inaccuracies, such as hallucinated data or incorrect outputs. Performance Insights The models demonstrated varying levels of success, with some excelling in specific areas while others faced significant challenges. Below is a detailed analysis of each model's performance: Opus 4: This model excelled in handling multi-step processes and agentic tasks, making it highly effective for complex workflows. However, its slower execution speed and high token cost of $75 per million tokens were notable drawbacks. This model excelled in handling multi-step processes and agentic tasks, making it highly effective for complex workflows. However, its slower execution speed and high token cost of $75 per million tokens were notable drawbacks. Sonnet Models: Sonnet 3.7 outperformed Sonnet 4 in accuracy and tool usage, making it a more reliable choice for precision tasks. Sonnet 4, while less consistent, offered a budget-friendly alternative at $15 per million tokens. Sonnet 3.7 outperformed Sonnet 4 in accuracy and tool usage, making it a more reliable choice for precision tasks. Sonnet 4, while less consistent, offered a budget-friendly alternative at $15 per million tokens. Gemini 2.5 Pro: The most cost-efficient model at $15 per million tokens, with additional discounts for lower usage. It handled simpler tasks effectively but struggled with sequential tool usage and complex data synthesis. The most cost-efficient model at $15 per million tokens, with additional discounts for lower usage. It handled simpler tasks effectively but struggled with sequential tool usage and complex data synthesis. O3: This model performed well in sequential tool calls but was inconsistent in synthesizing and structuring information. Its token cost of $40 per million tokens provided a balance between affordability and performance. This model performed well in sequential tool calls but was inconsistent in synthesizing and structuring information. Its token cost of $40 per million tokens provided a balance between affordability and performance. Quinn 2.5 Max: Accuracy issues, particularly with benchmarks and release date information, limited its reliability for tasks requiring precision. Accuracy issues, particularly with benchmarks and release date information, limited its reliability for tasks requiring precision. DeepSeek R1: This model underperformed in rendering dashboards and maintaining accuracy, making it less suitable for tasks requiring visual outputs or structured data. Comparing 7 AI Coding Models: Which One Builds the Best Web App? Watch this video on YouTube. Dive deeper into coding language models (LLMs) with other articles and guides we have written below. Key Observations Several patterns emerged during the evaluation, shedding light on the strengths and weaknesses of the tested models. These observations can guide developers in selecting the most suitable model for their specific needs: Sequential Tool Usage: Models like Opus 4 demonstrated exceptional capabilities in managing multi-step tasks, a critical feature for complex workflows. Models like Opus 4 demonstrated exceptional capabilities in managing multi-step tasks, a critical feature for complex workflows. Hallucination Issues: Incorrect data generation, such as inaccurate release dates or benchmark scores, was a recurring problem, particularly for Quinn 2.5 Max and DeepSeek R1. Incorrect data generation, such as inaccurate release dates or benchmark scores, was a recurring problem, particularly for Quinn 2.5 Max and DeepSeek R1. Dashboard Rendering: While most models successfully rendered dashboards, DeepSeek R1 struggled significantly in this area, highlighting its limitations for tasks requiring visual outputs. While most models successfully rendered dashboards, DeepSeek R1 struggled significantly in this area, highlighting its limitations for tasks requiring visual outputs. Cost Variability: Token costs varied widely, with Gemini 2.5 Pro emerging as the most affordable option for simpler tasks, while Opus 4's high cost limited its accessibility despite its strong performance. Cost Analysis The cost of using these models played a pivotal role in determining their overall value. Below is a breakdown of token costs for each model, providing a clearer picture of their affordability: Opus 4: $75 per million tokens, the highest among the models tested, reflecting its advanced capabilities but limiting its cost-efficiency. $75 per million tokens, the highest among the models tested, reflecting its advanced capabilities but limiting its cost-efficiency. Sonnet 4: $15 per million tokens, offering a low-cost alternative with moderate performance for budget-conscious users. $15 per million tokens, offering a low-cost alternative with moderate performance for budget-conscious users. Gemini 2.5 Pro: The most cost-efficient model, priced at $15 per million tokens, with discounts available for lower usage, making it ideal for simpler tasks. The most cost-efficient model, priced at $15 per million tokens, with discounts available for lower usage, making it ideal for simpler tasks. O3: $40 per million tokens, providing a middle ground between cost and performance, suitable for tasks requiring balanced capabilities. Strategic Model Selection The evaluation revealed that no single model emerged as the definitive leader across all tasks. Instead, the findings emphasized the importance of selecting models based on specific project requirements. For example: Complex Tasks: Opus 4 proved to be the most capable for multi-agent tasks requiring sequential tool usage, despite its higher cost. Opus 4 proved to be the most capable for multi-agent tasks requiring sequential tool usage, despite its higher cost. Cost-Efficiency: Gemini 2.5 Pro offered the best value for simpler tasks with limited tool usage, making it a practical choice for budget-conscious projects. Gemini 2.5 Pro offered the best value for simpler tasks with limited tool usage, making it a practical choice for budget-conscious projects. Budget-Friendly Options: Sonnet 3.7 outperformed Sonnet 4 in accuracy, but both models remained viable for users prioritizing affordability. For highly complex projects, combining models may yield better results by using their individual strengths while mitigating weaknesses. Regardless of the model chosen, verifying outputs remains essential to ensure accuracy and reliability in your projects. This approach allows developers to maximize efficiency and achieve optimal results tailored to their unique requirements. Media Credit: Prompt Engineering Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Claude 4 Sonnet & Opus Tested to Their Limits : Which AI Model Reigns Supreme?
Claude 4 Sonnet & Opus Tested to Their Limits : Which AI Model Reigns Supreme?

Geeky Gadgets

time4 days ago

  • Business
  • Geeky Gadgets

Claude 4 Sonnet & Opus Tested to Their Limits : Which AI Model Reigns Supreme?

What happens when an AI model is pushed to its very edge? With the release of Claude 4, Anthropic has unveiled one of the most ambitious advancements in artificial intelligence to date. Promising unparalleled capabilities in coding, reasoning, and document analysis, the Claude 4 lineup is designed to cater to everyone—from developers tackling intricate algorithms to everyday users seeking smarter solutions. But bold claims often invite scrutiny. Can Claude 4 truly deliver on its promise of redefining AI performance, or does it falter under the weight of its own aspirations? This breakdown takes a closer look at where Claude 4 shines—and where it stumbles—when tested to its limits. Skill Leap AI show how Claude 4's two models, Opus and Sonnet, stack up against competitors like ChatGPT and Gemini 2.5 Pro. From its ability to process 1 million tokens to its integration with developer tools and web search functionality, Claude 4 offers a glimpse into the future of AI-driven workflows. Yet, it's not without its flaws—occasional lapses in nuanced logic and a steep price tag may leave some users questioning its value. Whether you're a professional seeking innovative tools or simply curious about the boundaries of modern AI, this exploration will reveal the strengths, challenges, and real-world potential of Claude 4. After all, innovation isn't just about what's possible—it's about how far we're willing to push the limits. Claude 4 AI Models Overview Comprehensive Overview of Claude 4 Models The new Claude lineup introduces two distinct models, each designed to address specific user requirements: Claude Opus 4: A premium model optimized for complex tasks such as advanced coding, in-depth reasoning, and extended problem-solving. It is particularly suited for software engineering, data analysis, and other technical domains. A premium model optimized for complex tasks such as advanced coding, in-depth reasoning, and extended problem-solving. It is particularly suited for software engineering, data analysis, and other technical domains. Claude Sonnet 4: A free, default option that offers improved precision and reasoning compared to earlier versions, making it ideal for general-purpose tasks. Both models feature a large context window capable of processing up to 1 million tokens. This capability enables them to analyze lengthy documents, engage in extended conversations, and handle complex workflows with ease. These features make Claude 4 models versatile tools for professionals and casual users alike. Performance and Practical Applications Claude Opus 4 demonstrates exceptional performance across several key areas, making it a valuable asset for technical and professional use cases: Coding and Debugging: The model excels in generating code, debugging errors, and optimizing algorithms, offering significant utility for software engineers and developers. The model excels in generating code, debugging errors, and optimizing algorithms, offering significant utility for software engineers and developers. Advanced Reasoning: It handles complex problem-solving tasks with notable accuracy, though it occasionally struggles with intricate logic, such as custom chess game coding or highly specialized workflows. It handles complex problem-solving tasks with notable accuracy, though it occasionally struggles with intricate logic, such as custom chess game coding or highly specialized workflows. Document Analysis: The large context window allows for efficient extraction and summarization of information from extensive files, such as legal contracts, financial reports, or research papers. Despite these strengths, the models face limitations in areas requiring nuanced logic or highly specialized domain expertise. These challenges highlight the need for further refinement to enhance their overall reliability. New Claude 4 Sonnet & Opus Tested Watch this video on YouTube. Expand your understanding of Claude 4 Models with additional resources from our extensive library of articles. Enhanced Features and Tool Integration The new Claude AI models introduce several advancements in tool integration, significantly enhancing its versatility and practical utility: Web Search Functionality: The inclusion of web search capabilities allows the models to deliver more accurate and context-aware responses, particularly for research and fact-checking tasks. The inclusion of web search capabilities allows the models to deliver more accurate and context-aware responses, particularly for research and fact-checking tasks. Developer Tools Integration: Seamless compatibility with platforms like GitHub and APIs makes Claude 4 an efficient choice for coding, project management, and collaborative workflows. Seamless compatibility with platforms like GitHub and APIs makes Claude 4 an efficient choice for coding, project management, and collaborative workflows. Hybrid Problem-Solving: By combining instant answers with advanced reasoning, Claude 4 provides a balanced approach to addressing both simple and complex queries. These features make the models adaptable to a wide range of professional, technical, and creative applications, further solidifying their position in the competitive AI landscape. Comparison with Competitors When compared to other leading AI models like Gemini 2.5 Pro and ChatGPT, Claude 4 exhibits several strengths and some notable limitations: Strengths: Claude 4 outperforms its competitors in coding and reasoning tasks, offering superior accuracy and functionality for technical applications. Claude 4 outperforms its competitors in coding and reasoning tasks, offering superior accuracy and functionality for technical applications. Weaknesses: Unlike Gemini 2.5 Pro, Claude 4 lacks multimodal capabilities, which limits its ability to process both text and visual data. This is a significant drawback for users requiring a more comprehensive AI solution. Unlike Gemini 2.5 Pro, Claude 4 lacks multimodal capabilities, which limits its ability to process both text and visual data. This is a significant drawback for users requiring a more comprehensive AI solution. Cost Considerations: The premium pricing of Claude Opus 4, particularly for API usage, makes it less accessible for budget-conscious users. In contrast, ChatGPT offers a more affordable alternative for general tasks, albeit with less advanced reasoning capabilities. These comparisons highlight Claude 4's niche appeal for users who prioritize high-level performance and advanced features over cost and multimodal functionality. Real-World Use Cases and Pricing Claude 4 models are designed to address a variety of practical use cases across different industries and user needs: Document Analysis: Extract and summarize critical information from large files, making the models particularly useful for legal, financial, and academic applications. Extract and summarize critical information from large files, making the models particularly useful for legal, financial, and academic applications. Data Visualization: Transform raw analytics data into shareable dashboards, streamlining reporting processes for businesses and organizations. Transform raw analytics data into shareable dashboards, streamlining reporting processes for businesses and organizations. Personal Assistance: Provide tailored recommendations, summarize reviews, and assist with general queries, enhancing productivity for individual users. However, the models face limitations in agentic workflows, such as autonomously completing multi-step tasks or booking appointments. These constraints may affect their utility in certain scenarios. The pricing structure reflects the premium positioning of Claude 4: Claude Opus 4: Starts at $20 per month for a basic plan with usage limits. The Max Plan, priced at $100 per month, offers extended usage for power users who require advanced capabilities. Starts at $20 per month for a basic plan with usage limits. The Max Plan, priced at $100 per month, offers extended usage for power users who require advanced capabilities. API Costs: Higher than those of competitors, potentially deterring developers and businesses from adopting it for large-scale projects. While the pricing aligns with the advanced features offered, it may limit accessibility for users with tighter budgets or less demanding requirements. Insights from Testing Testing of Claude 4 models revealed both impressive strengths and areas for improvement: Strengths: The models demonstrated significant advancements in coding and reasoning, particularly in handling complex tasks with precision and efficiency. The models demonstrated significant advancements in coding and reasoning, particularly in handling complex tasks with precision and efficiency. Limitations: Occasional errors in intricate workflows and nuanced logic highlighted the need for further refinement to enhance reliability. Occasional errors in intricate workflows and nuanced logic highlighted the need for further refinement to enhance reliability. Extended Thinking: Available only in paid plans, this feature improves response quality by considering broader contexts, making it particularly useful for in-depth analysis. Available only in paid plans, this feature improves response quality by considering broader contexts, making it particularly useful for in-depth analysis. Web Search Integration: Proved valuable for delivering up-to-date and accurate information, enhancing the models' utility for research and fact-checking. These findings underscore the potential of Claude 4 while pointing to areas that require further development to maximize its effectiveness. Balancing Innovation and Accessibility Claude 4 represents a significant advancement in AI technology, offering innovative capabilities in coding, reasoning, and document analysis. However, its premium pricing and limitations in multimodal capabilities and agentic workflows may restrict its appeal to specific user groups. For developers and professionals seeking high-level performance, Claude Opus 4 is a compelling choice. Meanwhile, Claude Sonnet 4 provides a reliable, cost-free option for general users who value precision and reasoning. As the AI landscape continues to evolve, Claude 4 sets a high standard for innovation, with its ultimate success hinging on its ability to balance performance, accessibility, and affordability in an increasingly competitive market. Media Credit: Skill Leap AI Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store