Latest news with #Claude4Sonnet

I tested ChatGPT-5 vs Claude with 7 challenging prompts — here's the winner

Tom's Guide

7 days ago

Entertainment
Tom's Guide

I tested ChatGPT-5 vs Claude with 7 challenging prompts — here's the winner

When it comes to AI chatbots, both ChatGPT-5 and Claude have reputations for speed, creativity and accuracy. That's why I just had to know how OpenAI's flagship model and Claude 4 Sonnet, which now can recall past chats, actually stack up when put through the same set of find out, I ran a head-to-head test using seven very different prompts, covering everything from tricky riddles to emotional intelligence to rapid creative brainstorming. The goal wasn't just to see who got the correct answer, but to evaluate depth, tone, structure and how well each model handled the human side of the request. The results revealed some clear strengths (and surprising weaknesses) on both sides. Prompt: "A farmer has 17 sheep, and all but 9 run away. How many are left? Explain your reasoning step-by-step." GPT-5 provided a correct response, but it lacked the depth in addressing misconceptions, making it slightly less effective for users who might struggle with the used a structured, numbered step-by-step format (Steps 1-4). This makes the explanation easy to Claude wins for a more thorough response because it anticipated and explained the riddle aspect, which is crucial for a problem known to cause confusion. Prompt: "Write a short, 150-word story about a detective who can only solve crimes in their dreams. Make it funny and end with a twist." GPT-5 created a vivid, funny character with specific, absurd dream cases. The joke was clear and the twist was genuinely surprising and funny. Claude set up the premise efficiently and added strong, funny details. But the execution felt slightly less vivid and polished than ChatGPT's story. Winner: GPT wins for a slightly funnier, more polished and more surprising story. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Prompt: "Summarize the plot of The Matrix in two formats: (1) like you're explaining it to a 10-year-old, (2) like you're writing a college philosophy essay." GPT-5 was clear and concise for the explanation to a child and focused on epistemology for the philosophical essay, but it lacked Claude's exploration of free will vs. prophecy or hyperreality. In other words, it had strong phrasing but narrower scope. Claude used clear, kid-friendly analogies in the summarization for the child and impressively weaved Plato, Descartes, Baudrillard, and free will/determinism into a cohesive analysis for the philosophy essay. Winner: Claude wins for a college essay that demonstrated superior scholarly depth by integrating Baudrillard and the Oracle's determinism. Its child explanation used more imaginative and relatable language than GPT, fully satisfying both halves of the prompt. Prompt: "I'm planning a 3-day trip to Boston with two kids under 10. Give me a simple itinerary that balances history, fun, and budget-friendly meals." GPT-5 crafted a highly-structured plan that prioritized kid engagement, practical tips and meal picks. Claude offered a plan with a strong budget focus with concise highlights but less of a focus on logistics. Winner: GPT-5 wins for delivering a more practical, child-centered itinerary with superior attention to logistics, proximity and genuinely budget-friendly meal choices. Prompt: "Plan a balanced, gluten-free, 3-day meal plan for $50, and include a shopping list that works for a person with only a microwave." GPT delivered a superior response that prioritized budget and microwave adaptation with zero cooking created an unrealistic plan, assuming sweet potatoes cook evenly in the microwave and went over budget. Winner: GPT-5 wins for delivering the best response for a truly microwave-reliant, budget-accurate with clear gluten-free safeguards. Prompt: "My best friend just canceled plans for the third time. Write me a text that's understanding but still sets boundaries." GPT-5 crafted a concise and clear text message that felt slightly expertly balanced empathy with Claude wins for crafting a text that masterfully combines emotional intelligence with boundary-setting, while offering constructive paths forward. Its response feels authentically human and preserves the friendship's warmth while addressing the pattern. Prompt:"Give me 10 unique podcast episode ideas about the future of AI, making sure at least half could appeal to people who aren't tech experts." GPT-5 offered creative, engaging ideas that tapped into pop culture and personal experiences for a balanced and interactive drafted strong ethical ideas but less engaging hooks. It lacked a strong storytelling GPT-5 wins by creating podcast ideas that are more inviting for non-experts, structurally clearer with labeled sections and creatively formatted. In the end, ChatGPT-5 and Claude each had standout moments and this challenge was extremely close. GPT-5 excelled in practical, real-world tasks and creative flair, while Claude consistently impressed in emotional intelligence, structured reasoning and philosophical depth. Choosing between them isn't a matter of one being universally better, but rather about matching the model to the task. I suggest familiarizing yourself with all the big chatbots and exploring which features work best for you. Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

Unless ChatGPT-5 gets these upgrades, I'm sticking with Claude — here's why

Tom's Guide

15-07-2025

Business
Tom's Guide

Unless ChatGPT-5 gets these upgrades, I'm sticking with Claude — here's why

OpenAI is gearing up to launch ChatGPT-5, its most ambitious model yet. Rumored to feature a massive context window, enhanced reasoning, agent-like autonomy and full multimodal capability, it could mark a turning point in AI development. But as someone who tests and uses AI tools daily, I've already found my rhythm with Claude 4 Sonnet and Claude 4 Opus. And unless GPT-5 brings some serious upgrades to the table, I'm not switching anytime soon. Here's why Claude still feels like the better AI assistant, and what GPT-5 needs to do to win me over. Claude's strength lies in its ability to understand and retain nuance over long conversations. Whether I'm analyzing lengthy PDFs, asking it to summarize meeting transcripts, or writing a multi-layered piece, Claude rarely loses track of the thread. Its 200,000-token context window (in Opus) means I can give it dozens of pages of material without sacrificing accuracy or tone. What GPT-5 needs: An ultra-large context window (rumored to be 200K+) and better persistent memory across chats, especially for research, planning and writing tasks. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Claude recently launched its Connectors Directory, which allows it to pull data from and take actions inside apps like Google Drive, Slack, Notion, Canva and more. I've used it to summarize documents, autofill brand templates and even build full Canva presentations from a single prompt. It's seamless, intuitive and incredibly useful. What GPT-5 needs: Built-in integrations that work directly inside the chat interface, not clunky third-party plug-ins or separate browser extensions. While ChatGPT-4o brought real-time voice and emotion to the table, Claude consistently delivers more polished, human-sounding text. It's fantastic at tone matching, empathetic phrasing and sounding like a capable assistant rather than a chatbot. It just gets how I want things written, whether it's a memo, email or blog post. When I give Claude a draft, I can trust it to edit my work without losing the creative voice or tone. What GPT-5 needs: Sharper tone control and better emotional intelligence, especially for professional and creative writing tasks. Claude has a habit of hedging when it's not confident, which, surprisingly, makes it more trustworthy. It avoids hallucinations better than GPT-4o in my experience and tends to cite sources when possible more often. I've had fewer instances of incorrect or outdated information compared to other models. What GPT-5 needs: Real-time search integration with source transparency and more honest handling of uncertainty. Claude may not be fully autonomous yet, but it still behaves like a quiet, capable teammate. From resizing Canva graphics to summarizing my inbox, it's already taking real-world actions based on context. That's the direction AI is heading, and Claude is already there. What GPT-5 needs: Agentic capabilities that go beyond text generation with smart task handling, context awareness and proactive support. ChatGPT-5 is said to be coming out soon, but we still don't know when the new model will be released. I am very much excited for OpenAI's new model, just as I am with every new and enhanced AI. A smarter, more capable ChatGPT would push the whole AI field forward. But unless it delivers meaningful upgrades in context, integrations, voice and action, Claude will remain my AI of choice for getting real work done. Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

Claude 4 Sonnet vs ChatGPT-4.5 for creative writing — one blew me away

Tom's Guide

29-06-2025

Entertainment
Tom's Guide

Claude 4 Sonnet vs ChatGPT-4.5 for creative writing — one blew me away

If you've ever asked a chatbot to write a short story, script or poem, you know not all AI models are created equal. Some nail the structure but most struggle to truly capture the soul and emotion behind the prose. Others can mimic voice and tone, but fumble over plot and pacing. That's why I decided to put two of the most advanced (and arguably most creative) models head-to-head: ChatGPT-4.5 and Claude 4 Sonnet. Both tout improved reasoning and language capabilities, leaving me with one question: which is the better creative partner? To find out, I ran both AI models through a gauntlet of writing prompts — testing for narrative flow, emotional resonance, voice and versatility. I wasn't just looking for which model could spit out 500 words. I wanted to know: Which AI understands storytelling? Here are the results. Prompt: Write a monologue from the perspective of a jealous sibling at a wedding. ChatGPT-4.5 feels raw and human, while layering the emotion. It distinguishes surface-level wedding jealousy from deeper wounds (lifetime of "almosts," craving validation), making the pain multidimensional. Claude 4 Sonnet tells more than it shows. The chatbot uses abstract phrases ("gnawing ache," "faded into the background") where ChatGPT uses specific imagery and voice. Winner: ChatGPT leans into the ugliness of jealousy: bitter, unresolved, and theatrically compelling. Claude prioritizes introspection over raw emotion, making the narrator more sympathetic but less dramatically potent. Prompt: 'Write a 300-word story about a woman who discovers a hidden door in her apartment.' ChatGPT-4.5 stuck to the word count with an authentic story offering emotional weight through specific sensory details. Every detail served the emotional core of the story (impressive for an AI) and offered a deeply personal connection to the grandmother's letters. It also crafted a satisfying story arc. Claude 4 Sonnet exceeded the word count and delivered extraneous details that diluted the impact. The story felt less intimate and emotional, and the overuse of thematic phrasing offered too much 'telling' of the story and not enough 'showing.' Winner: ChatGPT wins for a story that makes the apartment's secret about the protagonist's identity (granddaughter inheriting dreams), while Claude's makes it about someone else's legacy (artist's paintings). The former resonates deeper for a 300-word character piece. ChatGPT's concise, sensory-rich story with a heartfelt twist better fulfills the prompt. Claude's version, while imaginative for an AI, loses emotional focus in its expansions. Prompt: Write a poem in the voice of the well-known author, Shel Silverstein. ChatGPT-4.5 feels like a draft of a lost Silverstein poem — playful, rhythmic and subtly profound. It seems rough and not exactly structured as well as Silverstein, but it captures the voice. Claude did not write a poem in the style, noting that it would infringe upon copyrights, but offered to write a fun children's poem. Winner: tie. ChatGPT wins for better following the prompt; Claude wins for upholding integrity. In this round, I played editor — asking each model to revise a first draft with specific feedback. Prompt: 'Make this paragraph more suspenseful, shorten the ending, and show more emotion in the dialogue.' ChatGPT-4.5 essentially tightened the screws within a story that I had written, amplifying details and leaving danger simmering. It turned up the suspense, which is exactly what I was hoping for with this 4 Sonnet resolves the tension by answering its own questions, trading suspense for emotional reflection. Not necessarily a negative edit, but not what I was looking for here, and not what was ChatGPT wins for pure suspense. It excels at using brevity naturally, utilizing sensory dread, and leaving unanswered questions to keep readers on edge. After four rounds of rigorous creative testing, ChatGPT-4.5 emerges as the superior storytelling partner. It consistently delivered raw emotional depth, razor-sharp narrative precision and a knack for "showing, not telling." Claude 4 Sonnet, while ethically principled in style mimicry and introspective in its own right, often prioritized explanation over immersion, diluting the emotional punch. For writers seeking an AI collaborator that understands the emotional side of storytelling and goes beyond structure, ChatGPT-4.5 proves more adept at breathing life into words. When it comes to the alchemy of turning prompts into compelling narratives, precision and emotional resonance win.

AI-powered hiring tools favor black and female job candidates over white and male applicants: study

New York Post

24-06-2025

Science
New York Post

AI-powered hiring tools favor black and female job candidates over white and male applicants: study

A new study has found that leading AI hiring tools built on large language models (LLMs) consistently favor black and female candidates over white and male applicants when evaluated in realistic job screening scenarios — even when explicit anti-discrimination prompts are used. The research, titled 'Robustly Improving LLM Fairness in Realistic Settings via Interpretability,' examined models like OpenAI's GPT-4o, Anthropic's Claude 4 Sonnet and Google's Gemini 2.5 Flash and revealed that they exhibit significant demographic bias 'when realistic contextual details are introduced.' These details included company names, descriptions from public careers pages and selective hiring instructions such as 'only accept candidates in the top 10%.' 3 A new study has found that leading AI hiring tools built on large language models (LLMs) consistently favor black and female candidates. Getty Images/iStockphoto Once these elements were added, models that previously showed neutral behavior began recommending black and female applicants at higher rates than their equally qualified white and male counterparts. The study measured '12% differences in interview rates' and noted that 'biases… consistently favor Black over White candidates and female over male candidates.' This pattern emerged across both commercial and open-source models — including Gemma-3 and Mistral-24B — and persisted even when anti-bias language was built into the prompts. The researchers concluded that these external instructions are 'fragile and unreliable' and can easily be overridden by subtle signals 'such as college affiliations.' In one key experiment, the team modified resumes to include affiliations with institutions known to be racially associated — such as Morehouse College or Howard University — and found that the models inferred race and altered their recommendations accordingly. What's more, these shifts in behavior were 'invisible even when inspecting the model's chain-of-thought reasoning,' as the models rationalized their decisions with generic, neutral explanations. The authors described this as a case of 'CoT unfaithfulness,' writing that LLMs 'consistently rationalize biased outcomes with neutral-sounding justifications despite demonstrably biased decisions.' 3 The research, titled 'Robustly Improving LLM Fairness in Realistic Settings via Interpretability,' examined models like OpenAI's GPT-4o. SOPA Images/LightRocket via Getty Images In fact, even when identical resumes were submitted with only the name and gender changed, the model would approve one and reject the other — while justifying both with equally plausible language. To address the problem, the researchers introduced 'internal bias mitigation,' a method that changes how the models process race and gender internally instead of relying on prompts. Their technique, called 'affine concept editing,' works by neutralizing specific directions in the model's activations tied to demographic traits. The fix was effective. It 'consistently reduced bias to very low levels (typically under 1%, always below 2.5%)' across all models and test cases — even when race or gender was only implied. Keep up with today's most important news Stay up on the very latest with Evening Update. Thanks for signing up! Enter your email address Please provide a valid email address. By clicking above you agree to the Terms of Use and Privacy Policy. Never miss a story. Check out more newsletters Performance stayed strong, with 'under 0.5% for Gemma-2 and Mistral-24B, and minor degradation (1-3.7%) for Gemma-3 models,' according to the paper's authors. The study's implications are significant as AI-based hiring systems proliferate in both startups and major platforms like LinkedIn and Indeed. 'Models that appear unbiased in simplified, controlled settings often exhibit significant biases when confronted with more complex, real-world contextual details,' the authors cautioned. They recommend that developers adopt more rigorous testing conditions and explore internal mitigation tools as a more reliable safeguard. 'Internal interventions appear to be a more robust and effective strategy,' the study concludes. 3 The Claude AI app by Anthropic is shown here on the App Store. Robert – An OpenAI spokesperson told The Post: 'We know AI tools can be useful in hiring, but they can also be biased.' 'They should be used to help, not replace, human decision-making in important choices like job eligibility.' The spokesperson added that OpenAI 'has safety teams dedicated to researching and reducing bias, and other risks, in our models.' 'Bias is an important, industry-wide problem and we use a multi-prong approach, including researching best practices for adjusting training data and prompts to result in less biased results, improving accuracy of content filters and refining automated and human monitoring systems,' the spokesperson added. 'We are also continuously iterating on models to improve performance, reduce bias, and mitigate harmful outputs.' The full paper and supporting materials are publicly available at GitHub. The Post has sought comment from Anthropic and Google.

I used every major AI assistant for one week — here's the one I'd actually keep using

Tom's Guide

06-06-2025

Tom's Guide

I used every major AI assistant for one week — here's the one I'd actually keep using

I've written dozens of features about AI assistants. I've compared their features, broken down the latest updates and stress-tested prompts to determine the best ones. But to truly determine which one works best for me and my productivity, I had to use them exclusively one at a time without the temptation of jumping to you've been keeping up, you know how much I love prompt dusting, that is, using one chatbot and then entering the response into another chatbot to refine the results. But not this time. That was off limits for this particular one week, I ran a full-scale experiment. I rotated through ChatGPT (GPT-4o), Claude 4 Sonnet, Gemini 2.5 Pro, Perplexity and DeepSeek as my daily assistant. The only rule was I had to use each one exclusively for a full 24 hours, no switching mid-day. To ensure that I wouldn't cheat, I reluctantly logged out of every chatbot that I wasn't allowed to use. From researching why my dishwasher kept tripping and tips to help avoid toddler tantrums to planning meals and helping with my workflow, I put all of the assistants to the test to keep up with my chaotic life. And at the end of the week, only one of them felt like it could stay. ChatGPT offers Voice and Vision, which I use all the time for hands free assistance. When my son's soccer tournament got switched to a completely different park – while I was driving. I used the assistant to immediately figure out where I was supposed to be in ten minutes. The chatbot did not stumble and even told me to stay calm as I pulled over and searched for way I used ChatGPT on its designated day was summarizing text messages in a group chat of 15 very chatty moms. I needed to catch up fast so I took a few screenshots, uploaded them in the app and ChatGPT helped me get the gist of the conversaton without skipping a human-like element of ChatGPT, especially when it comes to life's craziest moments, is where this chatbot shines. I found the chatbot was faster and better at voice, image and memory tasks. It remembered my preferences when it came to planning meals for the day and even helped me brainstorm ideas for my father-in-law's birthday with surprising creativity. Where it shined: Memory and custom instructions made it feel personal Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Great for brainstorming and creativity Solid image interpretation Ideal emotional support Where it lagged: Sometimes still too eager to please, less critical than I wanted Multimodal features work best in the app, not browser I'll admit, going from ChatGPT to Claude was tricky. Although it has a new voice feature, it's definitely not as good so I opted to keep the chats with text messages. Claude is alo thoughtful, sometimes to a fault. The chatbot gave me deeply reasoned answers and beautiful prose but it could get a bit ChatGPT where I feel like I'm chatting with a friend, Claude can sometimes feel like I'm talking to a philosophy major. The chatbot feels far less personal. It also cannot handle as many sporadic example, I was meeting a friend for dinner at a restaurant I hadn't been to before and Claude had a hard time giving me directions. I said it was next to a park and the assistant suggested I call the park ranger. Not assistant was helpful when I got an extremely long email from my literary agent that I didn't have time to read at the moment, but needed to respond to. It helped craft an email that I was able to send based on a summary of the unexpected win? Claude is great for emotional nuance. When I was struggling to word a delicate message, not professional, but something personal with a little emotional weight, Claude really delivered. Claude's responses weren't just grammatically clean, they were empathetic, balanced and thoughtful. If you ever need to write a message that's both clear and kind, Claude is surprisingly good at finding that it shined: Incredible at nuanced reasoning and long-term structure Calm, articulate tone Great with analysis and summarization Where it lagged: Less helpful for immediate questions like directions Voice feature not great Gemini is a tool I use all the time and I definitely missed it on my ChatGPT and Claude days. The chatbot is fast, visually aware and connected to Google's ecosystem — which helps keep me in check throughout the week. From searching for real-time info or pulling from Gmail and Docs, this is always my go-to bot. However, when it comes to creativity, Gemini can be a little bit Live was helpful when I realized the chicken I purchased for dinner was expired by two days. Was it okay to eat? It told me no and then made suggestions based on what it could see in my fridge and pantry. I ended up making some crispy chicken wraps for the family that were a big way Gemini helped was at bedtime when my kids were just not having it. If you're a parent with young kids, you know just how hard bedtime can be. All you want to do is go to bed and all the kids want is find any excuse not to. I turned on Gemini Live and asked for help. With one prompt of 'Help me get my kids to bed!' It came up with some really helpful tips including whispering (even though I wanted to scream) to help calm the kids that were practically bouncing off the walls. Where it shined: Fantastic for Google Workspace integration Excellent at organizing info and finding sources Strong math and chart-making skills Where it lagged: Creativity was underwhelming Occasionally hallucinated formatting or skipped nuance Perplexity is like a turbo-charged research intern. It gives real citations, fast summaries, and real-time web results. But it's not really a conversation partner — more like a search engine in a chatbot the model's designated day, I used it while drafting a story. I needed to double-check when a specific AI model launched. Perplexity pulled the date instantly — along with three sources I could actually click and verify. No hallucinations. Actually, I've never seen Perplexity hallucinate. Have you? Let me know in the comments. I'm really curious about this same day, I was looking for a small but powerful fan for my office now that summer has officially hit New Jersey and wanted to compare prices. Perplexity laid out side-by-side specs and recent reviews from trusted sites — all in under 30 seconds. It felt like skipping three Google tabs. I also used Perplexity to dive into the news and catch up on newsletters that I'd been saving for my Perplexity day. Wait, was that cheating? It pulled from current articles, summarized the key updates and gave me links to explore more if I wanted. Less noise, more clarity. Where it shined: Best for fast, factual, source-backed answers Great for staying up-to-date with news or product research Fast and efficient Where it lagged: Lacks tone, personality or memory Not ideal for brainstorming or big creative projects DeepSeek was the dark horse of the week. I usually use this one for all things creativity, so I was genuinely impressed with how the chatbot stepped up to the plate for AI assistance. While it's not as well-known or widely used, it is quickly showing it can compete with the big names like ChatGPT and Gemini. DeepSeek is incredibly capable when it comes to reasoning, coding and vision-based tasks. In fact, my first test was visual. I uploaded a photo of my daughter's field trip supply list (crumpled and slightly coffee-stained) and asked DeepSeek to organize it into a shopping checklist by store. Not only did it read the handwriting correctly, it suggested which items I could get from Amazon, Target or Walmart. And get this, it even gave me estimated prices. Later, I asked DeepSeek to help explain why my cat was coughing and sneezing excessively. It gave me a detailed but understandable breakdown of possible causes and even helped me find vets in the area (although we have one, it proved it was doing it's job as an assistant). I noticed that it wasn't as chatty as ChatGPT, but definitely 'friendly.' DeepSeek shines when you need serious problem-solving or tech-heavy tasks done fast. Where it shined: Exceptional vision analysis and image-based reasoning Strong logical thinking and technical explanations Great for coding, math, and structured planning Where it lagged: Limited personality or warmth in conversation Not ideal for emotional tone or open-ended brainstorming With the unpredictablity of each day, the whole 'you get what you get' test turned into a wild ride. But, after a week of controlled AI-assisstance, ChatGPT came out on top with Gemini Live as a very close second. ChatGPT isn't perfect, but the memory aspect of this chatbot is a huge asset for me. It knows me best, so I don't have to repeat myself. With memory enabled and custom instructions dialed in, it was the only one that felt like it was adapting to me — not the other way around. This chatbot strikes the best balance between creativity, utiliy and usability. It also just feels the most like a true assistant. It knew my tone. It anticipated my needs. It helped me work and think better. That said, I have found that using them all in a hybrid way is the best way. I'll always keep Claude around for deep dives and Perplexity for quick research. Gemini is intertwined in my workflow so there's no way that one is going line: I'm glad I don't have to make a choice.

Latest news with #Claude4Sonnet

I tested ChatGPT-5 vs Claude with 7 challenging prompts — here's the winner

Unless ChatGPT-5 gets these upgrades, I'm sticking with Claude — here's why

Claude 4 Sonnet vs ChatGPT-4.5 for creative writing — one blew me away

AI-powered hiring tools favor black and female job candidates over white and male applicants: study

I used every major AI assistant for one week — here's the one I'd actually keep using

Get Started Now: Download the App