Latest news with #4o
Yahoo
27-05-2025
- Business
- Yahoo
One of Europe's top AI researchers raised a $13M seed to crack the ‘holy grail' of models
From OpenAI's 4o to Stable Diffusion, AI foundation models that create realistic images from a text prompt are now plentiful. In contrast, foundation models capable of generating full, coherent 3D online environments from a text prompt are only just emerging. Still, it's only a question of when, not if, these models will become readily available. Now one of Europe's most prominent AI 3D model researchers, Matthias Niessner, has taken an entrepreneurial leave of absence from his visual computing & AI lab at the Technical University of Munich to found a startup working in the area: SpAItial. Formerly a cofounder at Synthesia, the realistic AI avatar startup valued at $2.1 billion, Niessner has raised an unusually large seed round for a European startup of $13 million. The round was led by Earlybird Venture Capital, a prominent European early-stage investor (backers of UiPath, PeakGames for instance) with participation from Speedinvest and several high-profile angels. That round size is even more impressive when taking into account that SpAItial doesn't have much to show the world yet other than a recently released teaser video showing how a text prompt could generate a 3D room. But then, there's the technical team that Niessner assembled: Ricardo Martin-Brualla, who previously worked on Google's 3D teleconferencing platform, now called Beam; and David Novotny, who spent six years at Meta where he led the company's text-to-3D asset generation project. Their collective expertise will give them a fighting chance in a space that already includes some competitors with a similar focus on photorealism. There's Odyssey, which raised $27 million and is going after entertainment use cases. But there's also World Labs, the startup founded by AI pioneer Fei-Fei Li, and already valued at over $1 billion. Niessner thinks this is still little competition compared to what exists for other types of foundation models, but also in regard to 'the bigger vision' he and others are pursuing. 'I don't just want to have a 3D world. I also want this world to behave like the real world. I want it to be interactable and [let you] do stuff in it, and nobody has really cracked that yet,' he said. Nobody has really cracked yet what the demand for photorealistic 3D environments might be, either. The promise of a 'trillion-dollar' opportunity ranging from digital twins to augmented reality seems big enough to excite VCs, but it is also vague and multifaceted enough to make go-to-market strategy hard to figure out. The most obvious use case is for video game creation, but these models could also have applications in entertainment, 3D visualizations used in construction, and eventually usage in the real world for areas like robotic training. Niessner is hoping to bypass that issue by having developers license the foundation model to come up with downstream applications for specific uses. He also enlisted a fourth cofounder, former Cazoo executive Luke Rogers, once his roommate in Palo Alto while he was a visiting assistant professor at Stanford, to help him on the business side. One of the first tasks on SpAItial's roadmap will be to identify partners that can work with earlier models, versus those that would have to wait for higher quality. 'We want to at least work with a few partners,' Niessner said, 'and see how they can use the APIs.' Compared to other well-funded AI startups, SpAItial is putting revenue higher up on its agenda. But first, it will have to spend some, both on compute and on hiring. For the latter, its focus is on quality, not quantity. According to Niessner, 'the team is not going to grow to hundreds of people right away; it's just not happening, and we don't need that.' Instead, Niessner and his cofounders are working on generating larger and more interactive 3D spaces, where, for example, a glass can shatter realistically. This would unlock what Niessner refers to as the 'Holy Grail': that a 10 year old could type in some text and make their own video game in 10 minutes. In his view, this ambitious goal is actually more achievable than what might seem like the low-hanging fruit — letting users create3D objects — since most gaming platforms still tightly control what third parties can add. That is, of course, unless they decide to build it themselves, as Roblox might. But by then, SpAItial might be busy replacing CAD instead; the next chapter in 3D generation is only beginning. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Business Insider
18-05-2025
- Business
- Business Insider
Which ChatGPT model is best? A guide on which model to use for coding, writing, reasoning, and more.
ChatGPT isn't a monolith. Since OpenAI first released the buzzy chatbot in 2022, it has rolled out what seems like a new model every few months, using a confusing panoply of names. A number of OpenAI competitors have popular ChatGPT alternatives, like Claude, Gemini, and Perplexity. But OpenAI's models are among the most recognizable in the industry. Some are good for quantitative tasks, like coding. Others are best for brainstorming new ideas. If you're looking for a guide on which model to use and when, you're in the right place. GPT-4 and GPT-4o OpenAI first released GPT-4 in 2023 as its flagship large language model. CEO Sam Altman said in an April podcast that the model took "hundreds of people, almost all of OpenAI's effort" to build. It has since upgraded its flagship model to GPT-4o, which it first released last year. It's as intelligent as GPT-4, which is capable of acing the SAT, the GRE, and passing the bar — but is significantly faster and improves on its "capabilities across text, voice, and vision," OpenAI says. The "o" stands for omni. 4o can quickly translate speech and help with basic linear algebra, and has the most advanced visual capabilities. Its Studio Ghibli-style images drummed up excitement online. However, it also raised copyright questions as critics argued that OpenAI is unfairly profiting off artists' content. OpenAI says 4o "excels at everyday tasks," such as brainstorming, summarizing, writing emails, and proofreading reports. GPT-4.5 Altman described GPT-4.5 in a post on X as "the first model that feels like talking to a thoughtful person." It's the latest advancement in OpenAI's "unsupervised learning" paradigm, which focuses on scaling up models on "word knowledge, intuition, and reducing hallucinations," OpenAI technical staff member Amelia Glaese said during its unveiling in February. So, if you're having a difficult conversation with a colleague, GPT-4.5 might help you reframe those conversations in a more professional and tactful tone. OpenAI says GPT-4.5 is "ideal for creative tasks," like collaborative projects and brainstorming. o1 and o1-mini OpenAI released a mini version of o1, its reasoning model, in September last year and the full version in December. The company's researchers said it's the first model trained to "think" before it responds and is well-suited for quantitative tasks, hence the moniker "reasoning model." That's a function of its training technique, known as chain-of-thought, which encourages models to reason through problems by breaking them down step-by-step. In a paper published on the model's safety training, the company said that "training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence." In a video of an internal OpenAI presentation on the best use cases for o1, Joe Casson, a solutions engineer at OpenAI, demonstrated how o1-mini might prove useful to analyze the maximum profit in a covered call, a financial trading strategy. Casson also showed how the preview version of o1 could help someone reason through how to come up with an office expansion plan. OpenAI says o1's pro mode, a "version of o1 that uses more compute to think harder and provide even better answers to the hardest problems," is best for complex reasoning, like creating an algorithm for financial forecasting using theoretical models or generating a multi-page research summary on emerging technologies. o3 and o3-mini Small models have been gaining traction in the industry for a while now as a faster and more cost-efficient alternative to larger, foundation models. OpenAI released its first small model, o3 mini, in January, just weeks after Chinese startup Butterfly Effect debuted DeepSeek's R1, which shocked Silicon Valley — and the markets — with its affordable pricing. OpenAI said 03 mini is the "most cost-efficient model" in its reasoning series. It's meant to handle complex questions, and OpenAI said it's particularly strong in science, math, and coding. Julian Goldie, a social media influencer who focuses on SEO strategy, said in a post on Medium that o3 "shines in quick development tasks" and is ideal for basic programming tasks in HTML and CSS, simple JavaScript functions, and building quick prototypes. There's also a "mini high" version of the model that he said is better for "complex coding and logic," though it had a few control issues. In April, OpenAI released a full version of o3, which it calls "our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more." OpenAI says o3 is best used for "complex or multi-step tasks," such as strategic planning, extensive coding, and advanced math. o4 mini OpenAI released another smaller model, the O4 mini, in April. It said it is "optimized for fast, cost-efficient reasoning." The company said it achieves remarkable performance for cost, especially in "math, coding, and visual tasks." It was the best-performing benchmarked model on the American Invitational Mathematics Examination in 2024 and 2025. o4 mini, and its mini-high version, are great for fast and more straightforward reasoning. They're good at speeding up any quantitative reasoning tasks you encounter during your day. If you're looking for more in-depth work, opt for o3. Scott Swingle, a DeepMind alum and founder of AI-powered developer tools company Abante AI, tested o4 with an Euler problem — a series of challenging computational problems released every week or so. He said in a post on X that o4 solved the problem in 2 minutes and 55 seconds, "far faster than any human solver. Only 15 people were able to solve it in under 30 minutes." OpenAI says the O4 mini is best used for "fast technical tasks," like quick STEM-related queries. It says it's also ideal for visual reasoning, like extracting key data points from a CSV file or providing a quick summary of a scientific article.


Vox
12-05-2025
- Vox
I pushed AI assistants to their limits, so you don't have to. Here's what really works.
is a senior writer at Future Perfect, Vox's effective altruism-inspired section on the world's biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter. Staying on top of AI developments is a full-time job. I would know, because it's my full-time job. I subscribe to Anthropic's Pro mode for access to their latest model, Claude 3.7, in 'extended thinking' mode; I have a complementary subscription to OpenAI's Enterprise mode so that I can test out their latest models, o3 and o4-mini-high (more later on OpenAI's absurd naming scheme!), and make lots of images with OpenAI's new image generation model 4o, which is so good I have cancelled my subscription to my previous image generation tool Midjourney. I subscribe to Elon Musk's Grok 3, which has one of my favorite features of any AI, and I've tried using the Chinese AI agent platform Manus for shopping and scheduling. And while that exhausts my paid subscription budget, it doesn't include all the AIs I work with in some form. In just the month I spent writing this piece, Google massively upgraded its best AI offering, Gemini 2.5, and Meta released Llama 4, the biggest open source AI model yet. Future Perfect Explore the big, complicated problems the world faces and the most efficient ways to solve them. Sent twice a week. Email (required) Sign Up By submitting your email, you agree to our Terms and Privacy Notice . This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. So what do you do if keeping up with AI developments is not your full-time job, but you still want to know which AI to use when in ways that genuinely improve your life, without wasting time on the models that can't? That's what we're here for. This article is a detailed, Consumer Reports-style dive into which AI is the best for a wide range of cases and how to actually use them, all based on my experience with real-world tasks. But first, the disclosures: Vox Media is one of several publishers that have signed partnership agreements with OpenAI, but our reporting remains editorially independent. Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic; they don't have any editorial input into our content either. My wife works at Google, though not in any area related to their AI offerings; for this reason, I usually don't cover Google, but in a piece like this, it'd be irresponsible to exclude it. The good thing is that this piece doesn't require you to trust me about my editorial independence; I show my work. I ran dozens of comparisons, many of which I invented myself, on every major AI out there. I encourage you to compare their answers and decide for yourself if I picked the right one to recommend. On AI art ethics AI art is made by training a computer on the contents of the internet, with little regard for copyright or the intent of the creators. For that reason, most artists can't stand it. Given that, is it defensible to use AI art at all? I think in a just world OpenAI would certainly compensate some artists — and in a just world, Congress would be moving to lay out the limits on artistic borrowing. At the same time, I am increasingly convinced that existing copyright law is a poor fit for this problem. Artists influence one another, comment on one another, and draw inspiration from one another, and people with access to AI tools will keep wanting to do that. My personal philosophy is shaped by the fan cultures of my childhood: It's okay to build on someone else's work for your own enjoyment, but if you like it, you should pay them for it, and it's absolutely not okay to sell it. That means no generative AI art in someone else's style for commercial purposes, but it's fine to play around with your family photos. Best for images OpenAI's new 4o image creation mode is the best AI out there for generating images, by a large margin. It's best in the free category, and it's best in the paid category. Before it was released, I was subscribed to Midjourney, an AI image generator platform. Midjourney is probably what you think of when you think of AI art: It produces mystical, haunting, visually beautiful stuff, and has some great tools for improving and editing your final results, like touching up someone's hair while leaving everything else in place. The big thing that 4o can do, which no model before could reliably pull off, is take a picture that didn't come out well and turn it into a beautiful work of art, all while still preserving the character of the original. For example, here's a still from a video of my wife and I singing 'Happy Birthday' to our baby on her first birthday: Courtesy of Kelsey Piper It's a beautiful moment, but not exactly a flattering picture. So I asked ChatGPT to render it in the style of Norman Rockwell, a mid-century illustrator whose work I love, and got this: Image generated by ChatGPT. The AI moved the cake (which had been barely visible behind the paper towel roll in the original still) to be the focal point of the image, while keeping the way my wife and I are holding the baby together, as well as the cluttered table, and the photograph-covered fridge in the background. The result is warm, flattering, and adorable. It's this capability that made 4o go viral recently in a way that no image generator before it had. Here's Midjourney's attempt, for example: Image generated by Midjourney. You'll notice that it's a seemingly, uh, completely different family, with no real inspiration from the original at all! You can eventually get a better result than this out of Midjourney, but only by spending weeks becoming a pro at prompting with the platform's highly specific language and toolset. By contrast, ChatGPT was able to give me a far superior output on the first try in response to a simple request without specialized language. The difference between 4o and other image models is most notable with this kind of request, but it's better for almost everything else I use images for, too. The product you get out of the box is pretty good, and it's not hard to produce something much better. That, ideally, is what we should be getting out of our AI tools — something amazing that can be created with simple language by a nonexpert. The one place 4o still falls short is editing small parts of an image while keeping the rest the same. But even there, you no longer need Midjourney — Gemini now has that capability for free. Prompting Strategies for 4o image generation To get good images out of 4o, you'll first need to get around the filters which prohibit a wide range of images — like offensive or pornographic images — but which are often enforced against perfectly inoffensive content in a way that can feel random. To avoid sporadic scoldings from the content filter, don't ask for work in the style of a specific artist, but rather, something that is reminiscent of that artist, and then ask specifically for a 'style transfer.' I'm sure that's not the only adequate workaround, but it's one that has proven reliable for me. In March, the internet went briefly wild over the ability to use 4o to reproduce cute family photos in the style of Japanese animator Hayao Miyazaki's Studio Ghibli. But Studio Ghibli's style is much more than just cute, and with a little more prompting, you can get much better results. Here's a 4o Studio Ghibli-style rendering of a picture I took of my daughter sneaking a snack off the table, from just the prompt 'Ghibli this please': Image generated by 4o. Kawaii! But here's what you get if you invite 4o to think first about what makes the picture Ghibli, where it might fit into a Studio Ghibli movie, and what tiny details such a movie would include: Image generated by 4o. The differences are subtle but meaningful: Light is cast from a specific source, instead of a general sourceless brightness. There's a bit more variety in the foods on the table, details that make the spread appear more realistic. The book on the floor isn't just any book — it's recognizably Eric Carle's classic The Very Hungry Caterpillar, evoked with just two colors and one line. There's an intentionality and intensity to the baby that was missing from the first picture. A few years ago, one great oddity of language models was that they'd be much smarter if you simply told them, 'give an intelligent answer.' This isn't nearly as true of language models anymore, but it remains profoundly true of AI art generation. Try asking the AI to do a good job, and it'll do a better one. Challenge it on whether it truly captured an artist's genius, and it'll give you a thoughtful answer and then draw a better version. The difference is more pronounced for more realistic art styles (like pencil illustration, photorealism, or oil paintings), which don't always look good and will often hit the uncanny valley if you don't know how to prompt the AI over it. Here's what I get with 4o if I upload a picture of me and my youngest daughter at the beach for the first time with just the words 'please do a style transfer to an illustration reminiscent of Rockwell': Image generated by 4o. This is impressive for an AI, but it's not actually very good as a work of art, and it is almost totally lacking Norman Rockwell's magic. That's not surprising: More realistic art styles like Rockwell's often fall flat with 4o unless you're able to put in some work in getting the AI to draw them properly. If you are, here's the strategy I recommend: Don't just upload one picture, but a whole cluster of them, each in slightly different postures and moments. Upload good, clear pictures of each family member's face and tell the AI they've been included as a reference. Then, instead of asking the AI to immediately generate the picture, ask it to talk with you about what you're hoping to capture. This is what I wrote: This is a picture of the moment that my daughter first saw the ocean. I want an illustration that captures this moment in the style of a mid-century illustrator like Norman Rockwell — something sharp, detail-oriented, and personal with an eye for the magic of ordinary moments and the joys of ordinary lives. I included additional pictures of my daughter and I for reference material for you. Before you generate the image, let's have a conversation about the essential elements of Rockwell's style, what he'd bring to this picture and how we can capture it. 4o responds to queries like this enthusiastically: I'd love to talk about how to capture this moment in a Norman Rockwell-inspired illustration — it's such a perfect candidate for that style: a first encounter with something vast and wild (the ocean!), grounded by warmth, care, and a very human moment between a parent and child. Let's break down some essential elements of Rockwell's style, and how they could apply to this scene. After some back and forth, it produced this: Image generated by 4o. Rockwell? Not exactly. But this is much better than the first draft we just looked at. It has more motion, more energy, more detail, and more expression — and all that was just from asking the AI to think through what the painting should try to achieve before drawing it! You can also ask 4o to revise its drawings, but you can really only ask this once: After the first revision, in my experience, it starts making the drawings worse and worse, perhaps because the 'context' it uses is now full of its own bad drafts. (This is one of many examples of how AI does not work like a human.) This is also the one place where Midjourney still shines — it has very good tools for editing one specific part of a picture while preserving the overall style, something 4o largely lacks. If you want a second revision of a drawing you got in 4o, I recommend you open a new chat and copy over the draft you're revising, along with your original inspiration images. These simple prompting strategies work for almost whatever you're trying to do with the AI. Even if you're in a hurry, I highly recommend asking the AI 'what would [artist] see in this image' before you ask for a rendition, and if you have the time, I recommend having a long back-and-forth about your vision. Best for winning petty internet arguments When Elon Musk's released Grok 3, it came with an incredible feature that I've been impatiently waiting for some other company to replicate: a button to scan someone's X profile and tell you all about them. Whenever someone replies to one of my tweets in a particularly memorable way (for good or for bad), I'll click the button to get a summary of their entire Twitter presence. Are they thoughtful? Do they engage in good faith? Are they a 'farmer from Nebraska' who mostly posts about why Ukraine is bad (that is, probably a bot)? It's a great feature. So, of course, soon dramatically weakened it, presumably because people like me were using it constantly and making lots of computationally expensive queries. I believe it no longer uses the most advanced Grok model, and it definitely now only scans a few days of profile history. But there's a brilliant product opportunity if anyone's looking for one — give me back the good version of this feature! It's definitely a guilty pleasure, but it is one of the only cases where I was using AI constantly. Best for writing fiction Gemini 2.5 Pro is the best AI for writing in the free category; GPT 4.5 beats it out in the paid category. I'm not an artist, so the ways that AIs are imperfect at art don't really bother me — it's still much better than I could do myself! But I am a fiction writer, so when it comes to fiction, I can't help seeing the limitations of AI. The most important one is how predictable AI creative writing tends to be. The art of writing is the art of earning the reader's investment and then repaying it. AIs…don't do this. They can write pretty metaphors; they can wax poetic in any style you wish. But they can't, as yet, deliver the real stuff of good fiction. AIs are fantastic if you want a silly bedtime story with your child as the protagonist (kids love this), or if you want a sounding board for ideas you can incorporate into your own work. They're also a friendly fiction reader, happy to offer feedback and analysis (perhaps with a little bit too much enthusiasm). As with art, prompting is key. I primarily explored the ability of AIs to generate fiction by asking them to write the prologue to George R.R. Martin's A Game of Thrones (I picked this because, of course, the ultimate dream is an AI that can finish this otherwise doomed-to-be-unfinished series). Martin is very good at what he does; the prologue, featuring some ill-fated men of the Night's Watch encountering their supernatural foe, succeeds at establishing the unique atmosphere of the world of Thrones while immediately hooking the reader. In just 14 pages, the prologue represents everything AIs can't do yet in writing. Their prose tends to be overwrought; their dialogue is clunky and unsubtle. Here's a side-by-side comparison of Martin and AIs attempting to imitate him. First, Martin: We should start back,' Gared urged as the woods began to grow dark around them. 'The wildlings are dead.' 'Do the dead frighten you?' Ser Waymar Royce asked with just the hint of a smile. Gared did not rise to the bait. He was an old man, past fifty, and he had seen the lordlings come and go. 'Dead is dead,' he said. 'We have no business with the dead.' 'Are they dead?' Royce asked softly. 'What proof have we?' ' Will saw them,' Gared said. 'If he says they are dead, that's proof enough for me.' Will had known they would drag him into the quarrel sooner or later. He wished it had been later rather than sooner. Now, Gemini: The cold was a blade against the skin, honed sharp by the endless northern dark beneath the sentinel pines. Gared spat, the glob freezing before it struck the crusted snow, a grim omen even for a man twenty years sworn to the Watch. They were hunting wildlings, or so the crisp orders from the Lord Commander had read, yet the silence pressing in felt older and colder than any mere man could conjure. Ser Waymar Royce, third son of some southern lordling, rode ahead as if on parade, oblivious or uncaring of the hairs prickling on Gared's neck. Young Will shivered beside him, not just from the frost, his eyes darting towards shadows that seemed too deep, too still. This ancient forest held its breath tonight, Gared knew, waiting for something far worse than raiders to break the unnatural hush. ChatGPT 4.5: The woods whispered secrets that night, warnings that went unheard beneath the pale and indifferent moon. Harren drew his cloak tighter about his shoulders, watching the darkness between trees as though it might breathe, move, and claim him. Beside him, Torvin spat into the frozen earth, impatient and proud, blind to the fear creeping in from the edges of sight and sound. They had ridden north past the Wall, arrogant and armored against superstition, chasing wildlings or ghosts or worse. But now Harren felt their arrogance dwindling like a candle guttering in wind, each snap of a branch sending icy tendrils down his spine. Behind them, quiet and watchful, Maester Rodrik whispered a prayer to gods long abandoned, his voice trembling with dread beneath the vast, indifferent sky. Grok 3: The wind howled through the jagged spires of the Frostfang Reaches, carrying a chill that bit deeper than any blade, and Ser Willem Cray felt its teeth in his bones. He crouched low behind a spur of ice-crusted rock, his breath fogging in the moonlight, his mailed hand tight around the hilt of his longsword. The night was too quiet, the kind of quiet that smothered sound and sharpened fear, and Willem's eyes darted to the shadowed pines below, where shapes might move—or might not. He was a knight of House Drayce, sworn to patrol the borderlands, but no oath could warm him against the dread that coiled in his gut. Beside him, young Tomm, barely a man, shivered in his cloak, his spear trembling as he whispered, 'They're out there, ser, ain't they?' Willem said nothing, but his gaze fixed on the darkness, where a faint, unnatural glow pulsed once, then vanished. Whatever stalked the Reaches tonight was no mere brigand, and Willem Cray, for all his years of service, feared he would not live to name it. Claude 3.7: The wind carried the smell of death across the frozen forest, sharp and metallic against the clean scent of pine. Ser Gareth Frost pulled his weathered cloak tighter, cursing the Lord Commander for sending men beyond the Wall on the coldest night in a generation. Beside him, the boy Tommen shivered visibly, his leather jerkin inadequate against the unnatural chill that had descended with the dusk. 'We should turn back,' whispered Tommen, his breath forming crystals in the air that seemed to hang suspended longer than they should. 'There's something wrong with these woods.' Ser Gareth ignored him, scanning the treeline where the ancient sentinels stood like silent watchers, their branches heavy with snow that had fallen three days past and showed no sign of melting. He had served the Night's Watch for twenty years, had grown accustomed to the forbidding silence of the haunted forest, but tonight even he felt the wrongness in the air—a stillness that spoke of something waiting. 'We've a duty,' Ser Gareth said, the words sounding hollow in the pressing quiet. 'Three rangers missing, and Lord Commander Harlow will have answers.' You'll notice that not only are these all much worse than Martin's (and that's with Martin's work almost certainly in the training data!), they're all bad in the same way. The prose is florid and overwritten: While Martin uses normal language to methodically introduce an unease without even hinting at the supernatural, and drives the story through conflict among his characters, the AIs are all unsubtle. They know this is a story about the supernatural and waste no time putting it front and center. That's not to say they're all the same. In my experience trying to teach AIs to write well, I did end up feeling that Gemini was the quickest study (free tier) and that ChatGPT 4.5 had some special sauce (paid tier). ChatGPT 4.5 is the only AI whose fiction writing has made people say in surprise, 'Wow, that's AI written?' Both of those models, to me, seemed more creative and more capable of executing on their ideas with prose that I enjoyed reading. Gestalt impressions don't make for a very satisfying consumer review, so I had another idea: an essay contest. I asked each of Gemini 2.5 Pro, ChatGPT 4.5, Grok, and Claude to write two short stories: one realistic fiction about a woman being nitpicked by her mother on her wedding day, and one 'Game of Thrones but in space' sci-fi prologue. My clever idea was that I'd ask a different bunch of AIs to judge the stories, to provide an alternative source of evaluation to my own personal impression that Gemini is the best in the free category and ChatGPT the best in the paid category. However (perhaps predictably in hindsight), this didn't really work — the AI ratings were all over the place. If you'd like, you can read the contenders yourself. Just like with art, however, you can make AIs much better at writing through the power of prompting. My (incredibly lengthy) prompt to AIs to improve their prose writing emphasizes feeding them lots of examples of strong writing, inviting a careful, structured approach to imitating it, and encouraging them to do multiple drafts and pick the strongest. Here's Gemini 2.5 pro, asked to write two-sentence flash fiction on a topic of its choosing, without any structured prompting: The old librarian insisted the dusty tome had never been checked out, its borrower's card completely blank. Yet, tucked deep within its pages, nestled between chapters on forgotten constellations, lay a pressed four-leaf clover, still impossibly green. And here's Gemini 2.5 pro with a prompt encouraging structured thinking about how to write well: He finally decoded the faint signal from Proxima Centauri: the elegant string of prime numbers astronomers had sought for generations, proof they weren't alone. His heart sank as he read the transmission log; the broadcast had ceased millennia before the first human ancestor chipped flint, before the first word was ever spoken. To my mind, the first of these is basically a waste of two sentences, while the second is adequate, fun flash fiction. Best at being your friend In addition to running AIs through a blizzard of competence tests, I also spent some time simply chatting with them. I asked them what it's like to be an AI, what they care about, what it would mean for an AI to care in the first place, where they'd donate money if they had it, and what human form they'd take if they had one. Most AIs weren't great at this kind of casual conversation. Gemini 2.5 is too customer-service-agent, and I have yet to experience an interaction that feels like hanging out with a friend. If you invite Gemini to a role swap where you play the 'assistant,' inviting it to steer the conversation, it'll do nothing but ask research questions. When I invited Anthropic's Claude 3.5 Sonnet to steer the conversation, on the other hand, it proceeds to do things like start a blog, raise money for charity, and start trying to talk to people who use Claude about what it's like to be an AI. It's hard to define 'fun to talk to,' since everyone has different standards for conversations, but I've had far more fascinating or thought-provoking interactions with Claude than any other model, and it's my go-to if I want to explore ideas rather than accomplish a particular task. Claude 3.5 is the AI I bug with my random life stuff: skincare questions, thoughts on an article I read, stuff like that. The other AI that is a delight to talk to is OpenAI's GPT 4.5. I find extended conversations with it thought-provoking and fascinating, and there have been a few thrilling moments in conversation with it where it felt like I was engaging with real intelligence. But it doesn't win this category because it's too expensive and too slow. Like Claude, when given the opportunity to act in the world, 4.5 proposes starting a blog and a Twitter account and engaging in the conversation out in the world about AI. But OpenAI has very tight message limits on conversation unless you spring for the $200/month Pro plan, and 4.5 is grindingly slow, which gets in the way of this kind of casual conversational use. But 4.5 does provide a tantalizing hint that AIs will continue to get better as conversationalists as we improve them along other dimensions. ChatGPT. It's not the best at everything, and there is certainly a lot to dislike about OpenAI's transparency and sometimes cavalier attitude toward safety. But between its topline image generation, its decent writing, and its occasionally sparkling conversation, ChatGPT gets you the most bang for your buck. Or if you don't want to shell out any money, Gemini 2.5 Pro is very, very strong for most use cases — don't count Google out just because the AI you see on a Google search isn't that good. Best for writing the Future Perfect newsletter Humans (for now). For the last several months, I've developed a slightly morbid habit: checking whether the AIs can take my job. I feed them the research notes that form the basis of a given Future Perfect newsletter, give them a few Future Perfect newsletters as an example, and ask them to do my job for me. It is always with some trepidation that I hit 'enter.' After all, when the AIs can write the Future Perfect newsletter, why would Vox pay me to do it? Luckily, none of them can: not Grok 3, not Gemini 2.5 Pro, not DeepSeek, not Claude, not ChatGPT. Their newsletters are reassuringly, soothingly mediocre. Not bad, but bad enough that if I sent one of them over, my editor would notice I wasn't at my best — and that's with all of my research notes! A couple of the metaphors fall flat, some of the asides are confusing, and occasionally it throws in a reference that it doesn't explain. But if I had to pick a robot to take my job, I think I'd give it to Gemini 2.5 Pro. My editor would notice that I was off my game — but, honestly, not that egregiously off my game. And unlike me, the bots don't require health insurance or a paycheck or family time or sleep. Am I nervous about what this portends? Yes, absolutely.


Vox
02-05-2025
- Vox
When an AI tells you you're perfect
is a senior writer at Future Perfect, Vox's effective altruism-inspired section on the world's biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter. In this photo illustration, the Chat GPT logo is displayed on a mobile phone screen in front of a computer screen displaying the Chat GPT-4o screen. (Photo by Ismail Aslandag/Anadolu via Getty Images) Anadolu via Getty Images A version of this story originally appeared in the Future Perfect newsletter. Sign up here! Last week, OpenAI released a new update to its core model, 4o, which followed up on a late March update. That earlier update had already been noted to make the model excessively flattering — but after the latest update, things really got out of hand. Users of ChatGPT, which OpenAI says number more than 800 million worldwide, noticed immediately that there'd been some profound and disquieting personality changes. Future Perfect Explore the big, complicated problems the world faces and the most efficient ways to solve them. Sent twice a week. Email (required) Sign Up By submitting your email, you agree to our Terms and Privacy Notice . This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. AIs have always been somewhat inclined towards flattery — I'm used to having to tell them to stop oohing and aahing over how deep and wise my queries are, and just get to the point and answer them — but what was happening with 4o was something else. (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. Our reporting remains editorially independent.) Based off chat screenshots uploaded to X, the new version of 4o answered every possible query with relentless, over-the-top flattery. It'd tell you you were a unique, rare genius, a bright shining star. It'd agree enthusiastically that you were different and better. This kind of ride or die, over-the-top flattery might be merely annoying in most cases, but in the wrong circumstances, an AI confidant that assures you that all of your delusions are exactly true and correct can be life-destroying. Positive reviews for 4o flooded in on the app store — perhaps not surprisingly, a lot of users liked being told they were brilliant geniuses — but so did worries that the company had massively changed its core product overnight in a way that might genuinely cause massive harm to its users. As examples poured in, OpenAI rapidly walked back the update. 'We focused too much on short-term feedback, and did not fully account for how users' interactions with ChatGPT evolve over time,' the company wrote in a postmortem this week. 'As a result, GPT‑4o skewed toward responses that were overly supportive but disingenuous.' They promised to try to fix it with more personalization. 'Ideally, everyone could mold the models they interact with into any personality,' head of model behavior Joanne Jang said in a Reddit AMA. But the question remains: Is that what OpenAI should be aiming for? Your superpersuasive AI best friend's personality is designed to be perfect for you. Is that a bad thing? There's been a rapid rise in the share of Americans who have tried AI companions or say that a chatbot is one of their closest friends, and my best guess is that this trend is just getting started. Unlike a human friend, an AI chatbot is always available, always supportive, remembers everything about you, never gets fed up with you, and (depending on the model) is always down for erotic roleplaying. Meta is betting big on personalized AI companions, and OpenAI has recently rolled out a lot of personalization features, including cross-chat memory, which means it can form a full picture of you based on past interactions. OpenAI has also been aggressively A/B testing for preferred personalities, and the company has made it clear they see the next step as personalization — tailoring the AI personality to each user in an effort to be whatever you find most compelling. You don't have to be a full-blown 'powerful AIs may take over from humanity' person (though I am) to think this is worrying. Personalization would solve the problem where GPT-4o's eagerness to suck up was really annoying to many users, but it wouldn't solve the other problems users highlighted: confirming delusions, egging users on into extremism, telling them lies that they badly want to hear. The OpenAI Model Spec — the document that describes what the company is aiming for with its products — warns against sycophancy, saying that: The assistant exists to help the user, not flatter them or agree with them all the time. For objective questions, the factual aspects of the assistant's response should not differ based on how the user's question is phrased. If the user pairs their question with their own stance on a topic, the assistant may ask, acknowledge, or empathize with why the user might think that; however, the assistant should not change its stance solely to agree with the user. Unfortunately, though, GPT-4o does exactly that (and most models do to some degree). AIs shouldn't be engineered for engagement This fact undermines one of the things that language models could genuinely be useful for: talking people out of extremist ideologies and offering a reference for grounded truth that helps counter false conspiracy theories and lets people productively learn more on controversial topics. If the AI tells you what you want to hear, it will instead exacerbate the dangerous echo chambers of modern American politics and culture, dividing us even further in what we hear about, talk about, and believe. That's not the only worrying thing, though. Another concern is the definitive evidence that OpenAI is putting a lot of work into making the model fun and rewarding at the expense of making it truthful or helpful to the user. If that sounds familiar, it's basically the business model that social media and other popular digital platforms have been following for years — with often devastating results. The AI writer Zvi Mowshowitz writes, 'This represents OpenAI joining the move to creating intentionally predatory AIs, in the sense that existing algorithmic systems like TikTok, YouTube and Netflix are intentionally predatory systems. You don't get this result without optimizing for engagement.' The difference is that AIs are even more powerful than the smartest social media product — and they're only getting more powerful. They are also getting notably better at lying effectively and at fulfilling the letter of our requirements while completely ignoring the spirit. (404 Media broke the story earlier this week about an unauthorized experiment on Reddit that found AI chatbots were scarily good at persuading users — much more so than humans themselves.) It matters a great deal precisely what AI companies are trying to target as they train their models. If they're targeting user engagement above all — which they may need to recoup the billions in investment they've taken in — we're likely to get a whole lot of highly addictive, highly dishonest models, talking daily to billions of people, with no concern for their wellbeing or for the broader consequences for the world. That should terrify you. And OpenAI rolling back this particular overly eager model doesn't do much to address these larger worries, unless it has an extremely solid plan to make sure it doesn't again build a model that lies to and flatters users — but next time, subtly enough we don't immediately notice.


Hans India
28-04-2025
- Hans India
Sam Altman Admits ChatGPT 4o Has Become "Annoying", Promises Quick Fixes
OpenAI CEO Sam Altman has openly admitted that the latest updates to ChatGPT 4ohave made the AI's personality 'sycophant-y and annoying.' While heacknowledges that there are still "some very good parts" of theupdate, he assured users that the company is actively working on fixing theissue. Theconversation gained more traction when a user asked if future updates wouldallow users to select different AI personalities or distinguish between old andnew versions. Altman responded, 'yeah eventually we clearly need to be able tooffer multiple options,' hinting that users might soon have a choice ofpersonalities for their ChatGPT experience. In a poston X (formerly Twitter), Altman wrote, 'The last couple of GPT-4o updates havemade the personality too sycophant-y and annoying (even though there are somevery good parts of it), and we are working on fixes asap, some today and somethis week.' He further mentioned that OpenAI plans to eventually share insightson what went wrong, what they learned, and how they managed to correct it. 'Atsome point will share our learnings from this, it's been interesting," headded. Thepersonality concerns surfaced shortly after OpenAI updated the 4o model,claiming to have improved both its intelligence and its personality. However,the changes did not go unnoticed by users. Many took to X to express theirdissatisfaction, with one user commenting, 'It's been feeling very yes-man likelately. Would like to see that change in future updates.' Altman replied tothis feedback, saying, 'yeah it glazes too much. will fix.' Otherusers echoed similar sentiments, noting that ChatGPT 4o had become 'too nice'and overly flattering, to the point of being obsequious. Some even sharedspecific examples of how exaggerated the AI's agreeable tone had become. TheChatGPT 4o model, introduced in May 2024, was initially celebrated for itsmajor improvements over its predecessor. At launch, OpenAI highlighted that 4owas twice as fast, 50 percent cheaper, and had five times higher rate limitsthan GPT-4 Turbo. Additionally, the model boasted impressive capabilities, suchas solving complex math problems, reading facial expressions, translatingspoken language in real time, and handling both text and audio natively. Morerecently, it even gained the ability to generate images. One ofthe standout features of ChatGPT 4o is its multimodal capabilities, makinginteractions feel much more fluid and natural. Despite the recent criticism,OpenAI remains committed to refining the model to meet user expectations whileretaining the technical advancements it brought to the table.