I put 5 of the best AI image generators to the test using NightCafe — this one took the top spot

5 days ago

Competition in the AI image generator space is intense, with multiple companies like Ideogram, Midjourney and OpenAI hoping to convince you to use their offerings. That is why I'm a fan of NightCafe and have been using it for a few years. It has all the major models in one place, including DALL-E 3, Flux, Google Imagen and Ideogram.
I've created a lot of AI images over the years and every model brings something different. For example, Flux is a great general purpose model in different versions. Imagen 4 is incredible for realism and Ideogram does text better than anything but GPT-4o.
With NightCafe you can try the same prompt over multiple models, or even create a realistic image of say a train station using Google Imagen, then use that as a starter image for an Ideogram project to overlay a caption or stylized logo. You can also just use the same prompt over multiple models to see which you prefer.
NightCafe also offers most of the major video models including Kling, Runway Gen-4, Luma Dream Machine and Wan 2.1. For this test we're focusing on image models.
Having all those models to hand is a great way to test each of them to find the one that best matches your personal aesthetic — and they're each more different than you think.
As well as the 'headline' models like Flux and Imagen, there are also community models that are fine-tuned versions of Flux and Stable Diffusion. For this I focused on the core models OpenAI GPT1, Recraft v3, Google Imagen 4, Ideogram 3 and Flux Kontext.
I've come up with a prompt to try across each model. It requires a degree of photorealism, it presents a complex scene and includes a subtle text requirement.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Google's Imagen 4 is the model you'll use if you ask the Gemini app to create an image of something for you. It's also the model used in Google Slides when you create images.
This was the first image for this test and while it captured the smoke rising it emphasised it a little. It did create a visually compelling scene and followed the requirement for the two people in the scene. It captured the correct vehicle but there's no sign of the text.
Black Forest Labs Flux models are among the most versatile and are open source. With the arrival of the Kontext variant, we got image models that also understand natural language better. This means, a bit like OpenAI's native image generation in GPT-4o, it gives much more accurate results, especially when rendering text or complex scenes.
Flux Kontext captured the 'Cafe Matin' perfectly, got the woman right and it somehow feels more French than Imagen but I don't think it's as photographically accurate.
GPT Image-1, not to be confused with the 2018 original GPT-1 model, is a multimodal model from OpenAI designed for improved render accuracy, it is used by Adobe, Figma, Canva and NightCafe. Like Kontext, it has a better understanding of natural language prompts.
One downside to this model is it can't do 9:16 or 16:9 images. Only variants of square. It captured the truck and the name, but I don't think the scene is as good. It also randomly generated a second umbrella and placement of hands feels unreal.
Ideogram has been one of my favorite AI image models since it launched. Always able to generate legible text, it is also more flexible in terms of style than the other models. The Ideogram website includes a well designed canvas and built-in upscaler.
The result isn't perfect, the barista leans funny but the lighting is more realistic, the scene is also more realistic with the truck on the sidewalk instead of the road. It also feels more modern and the text is both legible and well designed.
Recraft is more of a design model, perfect for both rendered text and illustration, but that doesn't mean it can't create a stunning image. When it hit the market it shook things up, beating other models to the top of leaderboards.
I wasn't overly impressed with the output. Yes, it's the most visually striking in part thanks to the space given to the scene. But it over emphasises the smoke and where is the barista? Also for a model geared around text — there's no sign writing.
While Flux had a number of issues visually, it was the most consistent and it included legible sign writing. If I were using this commercially, as a stock image, I'd go with the Google Imagen 4 image, but from a purely visual perspective — Flux wins.
What you also get with Flux Kontext is easy adaptation. You could make a secondary prompt to change the truck color or replace the old lady with a businessman. You can do that in Gemini but not with Imagen. You'd need to use native image generation from Gemini 2+.
If you want to make a change to any image using Kontext, even if it wasn't a Kontext image originally, just click on the image in NightCafe and select "Prompt to Edit". Costs about 2.5 credits and is just a simple descriptive text prompt away.
I used the most expensive version of each model for this test. The one that takes the most processing time to work on each image. This allowed for the fairest comparison. What surprises me is just how differently each model interprets the same descriptive prompt. But it doesn't surprise me how much better they've all got at following that description.
What I love about NightCafe though, is its one stop shop for AI content. It isn't just a place to use all the leading image and video models, it contains a large community with a range of games, activities and groups centered around content creation. Also, you can edit, enhance, fix faces, upscale and expand any image you create within the app.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Apple is eyeing a ChatGPT-like search, but it must focus beyond Siri

Digital Trends

an hour ago

Digital Trends

Apple is eyeing a ChatGPT-like search, but it must focus beyond Siri

It's no secret that Apple is currently struggling to deliver a smash-hit AI product, the way Google has served with Gemini, or Microsoft has achieved with Copilot. The company has been trying a similar overhaul with Siri, but those plans have been beset by delays, and it is only expected to see the light of day in late 2026. The delay spooked Apple to such an extent that the company inked a stopgap deal with OpenAI, which helped integrate ChatGPT with Siri, and broadly, with the Apple Intelligence stack. But it seems Apple is working on a radical in-house solution, one that would essentially be a watered-down approach to ChatGPT, but with internet search capabilities. Siri, but flavored like ChatGPT Lite? According to Bloomberg, a newly formed Answers, Knowledge and Information (AKI) team at Apple is working on a ChatGPT-inspired search framework for Siri. 'While still in early stages, the team is building what it calls an 'answer engine' — a system capable of crawling the web to respond to general-knowledge questions,' says the report. In addition to Siri, Apple reportedly plans to integrate the search functionality within Spotlight and Safari, as well. Spotlight has already received a massive functional upgrade in macOS Tahoe, so it won't be surprising to see it evolve into a universal answering hotspot, one that covers local data and information sourced from the internet. Recommended Videos It may sound chaotic at first, but it's not entirely alien. How does Siri, Spotlight, or Safari know when I want an AI to answer my query, or launch a web Search? Well, look no further than Dia. The universal search box in the AI-focused browser dynamically switches between 'chat' and 'Google' mode as you type your search keywords. When you type 'Birkin bag' in the text field, it defaults to web search mode. But as you type 'where to buy a Birkin bag,' the search field automatically switches to chat mode and offers the answer, just the way ChatGPT or any other AI answering engine like Perplexity would handle your questions. Right now, when you summon Siri on your iPhone and ask it a question that requires searching the internet or just pulling knowledge from an information bank, it opens a prompt box asking whether the question can be offloaded to ChatGPT. Once you agree, ChatGPT kicks into action and offers the required information. Of course, it's not seamless. With Siri gaining web search capabilities and enhanced natural language comprehension (akin to a ChatGPT or Gemini), it would be much easier for users to simply ask anything they want and get it answered. In its current state, Siri feels like a relic of the past, especially when compared to products such as Google's Gemini Live or ChatGPT's voice mode. In fact, Gemini works better on iPhones than Siri. As far as Apple's plans go, building something as advanced as ChatGPT or Gemini seems like a far-fetched goal. As per Bloomberg, plans for 'LLM Siri' have kept running into delays, and the recent exodus of top AI talent casts more doubts over Apple's ambitions of reimagining Siri for the AI era. It's not just about a phone assistant Building a next-gen virtual assistant – just the way Google Assistant has evolved into Gemini, or Copilot at Microsoft – is not the only area where Apple is currently lagging far behind the competition. In fact, Big Tech is now as focused as much on chatbots as it is on web browsers. Agentic workflows are now being seen as the next big thing in the field of AI. In a recent interview, co-founder and chief of Perplexity, Aravind Srinivas, explained why browsers are more suitable for AI than AI chatbots and apps: 'You get full transparency and visibility, and you can just stop the agent when you feel like it's going off the rails and just complete the task yourself, and you can also have the agent ask for your permission to do anything. So that level of control, transparency, trust in an environment that we are used to for multiple decades, which is the browser.' Unfortunately, Apple is severely lagging behind in the browser wars. With the introduction of AI Mode in Search and deeply integrating Gemini across its Workspace ecosystem, Google has changed how deeply AI can change web browsing and web-based workflows. Safari desperately needs an AI overhaul Upstart browsers such as Dia and Perplexity's Comet have proved that the era of legacy tools such as extensions is coming to an end. Soon, skills and custom agents will take over. Less than a week ago, Microsoft introduced Copilot Mode in Edge. I have spent a few days with the new AI-powered tools in Edge, and I believe it's a bold (and dramatically more practical) new direction for web browsers. In comparison, Safari misses out on any such AI-driven experiences. From a context-aware sidebar to multi-tab contextual actions, Apple's browser is sorely missing out on the conveniences that AI is bringing to modern age web browsers. Assuming Apple succeeds at building its own ChatGPT-like answer engine, it would take a massive undertaking to build meaningful features around it in Safari. Right now, what Apple needs to do is not just build an answering engine, but pay close attention to the competition. I am sure Apple is monitoring the shifting landscape of AI agents and browsers. It simply has to pick up pace, or as CEO Tim Cook hinted at in a recent all-hands meeting, the company 'will make the investment to do it.' Will Apple acquire a hot AI lab like Perplexity or Anthropic? Only time will tell, but the company certainly has to take a more holistic approach with AI than just focus on building the next great AI chatbot.

China is betting on a real-world use of AI to challenge U.S. control

Yahoo

2 hours ago

Yahoo

China is betting on a real-world use of AI to challenge U.S. control

SHANGHAI - As the United States and China vie for control over the future of artificial intelligence, Beijing has embarked on an all-out drive to transform the technology from a remote concept to a newfangled reality, with applications on factory floors and in hospitals and government offices. China does not have access to the most advanced chips required to power cutting-edge models due to restrictions from Washington and is still largely playing catch-up with Silicon Valley giants like OpenAI. But experts say Beijing is pursuing an alternative playbook in an attempt to bridge the gap: aggressively pushing for the adoption of AI across the government and private sector. (The Washington Post has a content partnership with OpenAI.) Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. 'In China, there's definitely stronger government support for applications and a clear mandate from the central government to diffuse the technology through society,' said Scott Singer, an expert on China's AI sector at the Carnegie Endowment for International Peace. By contrast, the U.S. has been more focused on developing the most advanced AI models while 'the application layer has been totally ignored,' he said. China's push was on full display in Shanghai at its World Artificial Intelligence Conference, which ran until Tuesday. Themed 'Global Solidarity in the AI Era,' the expo is one part of Beijing's bid to establish itself as a responsible AI leader for the international community. This pitch was bolstered by the presence of international heavyweights like Eric Schmidt, former CEO of Google, and Geoffrey Hinton, a renowned AI researcher often called the 'Godfather of AI.' During the event, Beijing announced an international organization for AI regulation and a 13-point action plan aimed at fostering global cooperation to ensure the technology's beneficial and responsible development. 'China attaches great importance to global AI governance,' Li Qiang, China's premier, said at the opening ceremony on Saturday. It 'is willing to share its AI development experience and technological products to help countries around the world - especially those in the Global South,' he said, according to an official readout. Just last week, President Donald Trump announced a competing plan in a bid to boost American AI competitiveness by reducing regulation and promoting global exports of U.S. AI technology. Washington has moved in recent years to restrict China's access to chips necessary for AI development, in part due to concerns about potential military applications of such models and degrading U.S. tech leadership. The Trump administration's approach to chip policy, however, has been mixed. Earlier this month, the White House reversed a previous ban on specific AI chips made by U.S. tech giant Nvidia being exported to China. This shift occurred amid trade negotiations between the world's two largest economies, which have been locked in an escalating tariff and export control war since Trump returned to the Oval Office earlier this year. There was nothing but excitement about AI in the vast expo center in Shanghai's skyscraper-rich Pudong district, where crowds entered gates controlled by facial recognition. Inside, thousands of attendees listened to panels stacked with Chinese government officials, entrepreneurs and international researchers, or watched demonstrations on using AI to create video games, control robotic movements and respond in real time to conversations via smartglasses. Chinese giants like Huawei and Alibaba and newer Chinese tech darlings like Unitree Robotics were there. DeepSeek was not present, but its name was spoken everywhere. The Hangzhou-based upstart has been at the forefront of Beijing's attempt to push the government use of AI since it released a chatbot model in January, prompting a global craze and driving home China's rapid AI advances. DeepSeek has been put to work over the last six months on a wide variety of government tasks. Procurement documents show military hospitals in Shaanxi and Guangxi provinces specifically requesting DeepSeek to build online consultation and health record systems. Local government websites describe state organs using DeepSeek for things like diverting calls from the public and streamlining police work. DeepSeek helps 'quickly discover case clues and predict crime trends,' which 'greatly improves the accuracy and timeliness of crime fighting,' a city government in China's Inner Mongolia region explained in a February social media post. Anti-corruption investigations - long a priority for Chinese leader Xi Jinping - are another frequent DeepSeek application, in which models are deployed to comb through dry spreadsheets to find suspicious irregularities. In April, China's main anti-graft agency even included a book called 'Efficiently Using DeepSeek' on its official book recommendation list. China's new AI action plan underscores this push, declaring that the 'public sector should take the lead in deploying applications' by embedding AI in education, transportation and health care. It also emphasizes a mandate to use AI 'to empower the real economy' and praises open-source models - which are more easily shared - as an egalitarian method of AI development. Alfred Wu, an expert on China's public governance at the National University of Singapore, said Beijing has disseminated a 'top-down' directive to local governments to use AI. This is motivated, Wu said, by a desire to improve China's AI prowess amid a fierce rivalry with Washington by providing models access to vast stores of government data. But not everyone is convinced that China has the winning hand, even as it attempts to push AI application nationwide. For one, China's sluggish economy will impact the AI industry's ability to grow and access funding, said Singer, who was attending the conference. Beijing has struggled to manage persistent deflation and a property crisis, which has taken a toll on the finances of many families across the country. 'So much of China's AI policy is shaped by the state of the economy. The economy has been struggling for a few years now, and applications are one way of catalyzing much-needed growth,' he said. 'The venture capital ecosystem in AI in China has gone dry.' Others point out that local governments trumpeting their usage of DeepSeek is more about signaling than real technology uptake. Shen Yang, a professor at Tsinghua University's school of artificial intelligence, said DeepSeek is not being used at scale in anti-corruption work, for example, because the cases involve sensitive information and deploying new tools in these investigations requires long and complex approval processes. He also pointed out that AI is still a developing technology with lots of kinks. 'AI hallucinations still exist,' he said, using a term for the technology's generation of false or misleading information. 'If it's wrong, who takes responsibility?' These concerns, however, felt far away in the expo's humming hallways. At one booth, Carter Hou, the co-founder of Halliday, a smartglasses company, explained how the lenses project a tiny black screen at the top of a user's field of vision. The screen can provide translation, recordings and summaries of any conversation, and even deploy 'proactive AI,' which anticipates questions based on a user's interactions and provides information preemptively. 'For example, if you ask me a difficult question that is fact related,' Hou said, wearing the trendy black frames, 'all I need to do is look at it and use that information and pretend I'm a very knowledgeable person.' Asked about the event's geopolitical backdrop, Hou said he was eager to steer clear of diplomatic third rails. 'People talk a lot about the differences between the United States and China,' he said. 'But I try to stay out of it as much as possible, because all we want to do is just to build good products for our customers. That's what we think is most important.' Kiki Lei, a Shanghai resident who started an AI video company and attended the conference on Sunday, seemed to agree with this goal. She said that Chinese AI products are easier to use than U.S. products because companies here really 'know how to create new applications' and excel at catering to, and learning from, the large pool of Chinese technology users. Robots, perhaps the most obvious application of AI in the real world, were everywhere at the conference - on model factory floors and in convenience stores retrieving soda cans, shaking disbelieving kids' hands, or just roaming the packed halls. At the booth for ModelBest, another Beijing-based AI start-up, a young student from China's prestigious Tsinghua University, who was interning at the company, demonstrated how a robot could engage with its surroundings - and charm its human interlocutors. Looking directly at the student, the robot described his nondescript clothing. 'The outfit is both stylish and elegant,' the robot continued. 'You have a confident and friendly demeanor, which makes you very attractive.' - - - Pei-Lin Wu in Taiwan contributed to this report. --- Video Embed Code Video: Robots ruled at the World Artificial Intelligence Conference in Shanghai, where China displayed its latest tech and AI innovation. Washington Post China correspondent Katrina Northrop reported from the event on July 26.(c) 2025 , The Washington Post Embed code: Related Content Pets are being abandoned, surrendered amid Trump's immigration crackdown The Post exposed this farmer's struggle. Then the USDA called. Kamala Harris will not run for California governor, opening door for 2028 run Solve the daily Crossword

Inside OpenAI's quest to make AI do anything for you

Yahoo

4 hours ago

Yahoo

Inside OpenAI's quest to make AI do anything for you

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI's models to solve high school math competitions. Today that team, known as MathGen, is considered instrumental to OpenAI's industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would. 'We were trying to make the models better at mathematical reasoning, which at the time they weren't very good at,' Lightman told TechCrunch, describing MathGen's early work. OpenAI's models are far from perfect today — the company's latest AI systems still hallucinate and its agents struggle with complex tasks. But its state-of-the-art models have improved significantly on mathematical reasoning. One of OpenAI's models recently won a gold medal at the International Math Olympiad, a math competition for the world's brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects, and ultimately power general-purpose agents that the company has always dreamed of building. ChatGPT was a happy accident — a lowkey research preview turned viral consumer business — but OpenAI's agents are the product of a years-long, deliberate effort within the company. 'Eventually, you'll just ask the computer for what you need and it'll do all of these tasks for you,' said OpenAI CEO Sam Altman at the company's first developer conference in 2023. 'These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.' Whether agents will meet Altman's vision remains to be seen, but OpenAI shocked the world with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind that breakthrough are the most highly sought-after talent in Silicon Valley. Mark Zuckerberg recruited five of the o1 researchers to work on Meta's new superintelligence-focused unit, offering some compensation packages north of $100 million. One of them, Shengjia Zhao, was recently named chief scientist of Meta Superintelligence Labs. The reinforcement learning renaissance The rise of OpenAI's reasoning models and agents are tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on whether its choices were correct or not in simulated environments. RL has been used for decades. For instance, in 2016, about a year after OpenAI was founded in 2015, an AI system created by Google DeepMind using RL, AlphaGo, gained global attention after beating a world champion in the board game, Go. Around that time, one of OpenAI's first employees, Andrej Karpathy, began pondering how to leverage RL to create an AI agent that could use a computer. But it would take years for OpenAI to develop the necessary models and training techniques. By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive amounts of internet data and a large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math. It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed 'Q*' and then 'Strawberry,' by combining LLMs, RL, and a technique called test-time computation. The latter gave the models extra time and computing power to plan and work through problems, verifying its steps, before providing an answer. This allowed OpenAI to introduce a new approach called 'chain-of-thought' (CoT), which improved AI's performance on math questions the models hadn't seen before. 'I could see the model starting to reason,' said El Kishky. 'It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.' Though individually these techniques weren't novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact checking abilities of AI reasoning models could be useful to power AI agents. 'We had solved a problem that I had been banging my head against for a couple of years,' said Lightman. 'It was one of the most exciting moments of my research career.' Scaling reasoning With AI reasoning models, OpenAI determined it had two new axes that would allow it to improve AI models: using more computational power during the post-training of AI models, and giving AI models more time and processing power while answering a question. 'OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,' said Lightman. Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an 'Agents' team led by OpenAI researcher Daniel Selsam to make further progress on this new paradigm, two sources told TechCrunch. Although the team was called 'Agents,' OpenAI didn't initially differentiate between reasoning models and agents as we think of them today. The company just wanted to make AI systems capable of completing complex tasks. Eventually, the work of Selsam's Agents team became part of a larger project to develop the o1 reasoning model, with leaders including OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki. OpenAI would have to divert precious resources — mainly talent and GPUs — to create o1. Throughout OpenAI's history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them. 'One of the core components of OpenAI is that everything in research is bottom up,' said Lightman. 'When we showed the evidence [for o1], the company was like, 'This makes sense, let's push on it.'' Some former employees say that the startup's mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models. By focusing on developing the smartest-possible AI models, rather than products, OpenAI was able to prioritize o1 above other efforts. That type of large investment in ideas wasn't always possible at competing AI labs. The decision to try new training methods proved prescient. By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field's momentum comes from advances in reasoning models. What does it mean for an AI to 'reason?' In many ways, the goal of AI research is to recreate human intelligence with computers. Since the launch of o1, ChatGPT's UX has been filled with more human-sounding features such as 'thinking' and 'reasoning.' When asked whether OpenAI's models were truly reasoning, El Kishky hedged, saying he thinks about the concept in terms of computer science. 'We're teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,' said El Kishky. Lightman takes the approach of focusing on the model's results and not as much on the means or their relation to human brains. 'If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,' said Lightman. 'We can call it reasoning, because it looks like these reasoning traces, but it's all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.' OpenAI's researchers note people may disagree with their nomenclature or definitions of reasoning — and surely, critics have emerged — but they argue it's less important than the capabilities of their models. Other AI researchers tend to agree. Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a blog post. Both, he says, are manmade systems inspired by nature — human reasoning and bird flight, respectively — but they operate through entirely different mechanisms. That doesn't make them any less useful, or any less capable of achieving similar outcomes. A group of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a recent position paper that AI reasoning models are not well understood today, and more research is needed. It may be too early to confidently claim what exactly is going on inside them. The next frontier: AI agents for subjective tasks The AI agents on the market today work best for well-defined, verifiable domains such as coding. OpenAI's Codex agent aims to help software engineers offload simple coding tasks. Meanwhile, Anthropic's models have become particularly popular in AI coding tools like Cursor and Claude Code — these are some of the first AI agents that people are willing to pay up for. However, general purpose AI agents like OpenAI's ChatGPT Agent and Perplexity's Comet struggle with many of the complex, subjective tasks people want to automate. When trying to use these tools for online shopping or finding a long-term parking spot, I've found the agents take longer than I'd like and make silly mistakes. Agents are, of course, early systems that will undoubtedly improve. But researchers must first figure out how to better train the underlying models to complete tasks that are more subjective. 'Like many problems in machine learning, it's a data problem,' said Lightman, when asked about the limitations of agents on subjective tasks. 'Some of the research I'm really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.' Noam Brown, an OpenAI researcher who helped create the IMO model and o1, told TechCrunch that OpenAI has new general-purpose RL techniques which allow them to teach AI models skills that aren't easily verified. This was how the company built the model which achieved a gold medal at IMO, he said. OpenAI's IMO model was a newer AI system that spawns multiple agents, which then simultaneously explore several ideas, and then choose the best possible answer. These types of AI models are becoming more popular; Google and xAI have recently released state-of-the-art models using this technique. 'I think these models will become more capable at math, and I think they'll get more capable in other reasoning areas as well,' said Brown. 'The progress has been incredibly fast. I don't see any reason to think it will slow down.' These techniques may help OpenAI's models become more performant, gains that could show up in the company's upcoming GPT-5 model. OpenAI hopes to assert its dominance over competitors with the launch of GPT-5, ideally offering the best AI model to power agents for developers and consumers. But the company also wants to make its products simpler to use. El Kishky says OpenAI wants to develop AI agents that intuitively understand what users want, without requiring them to select specific settings. He says OpenAI aims to build AI systems that understand when to call up certain tools, and how long to reason for. These ideas paint a picture of an ultimate version of ChatGPT: an agent that can do anything on the internet for you, and understand how you want it to be done. That's a much different product than what ChatGPT is today, but the company's research is squarely headed in this direction. While OpenAI undoubtedly led the AI industry a few years ago, the company now faces a tranche of worthy opponents. The question is no longer just whether OpenAI can deliver its agentic future, but can the company do so before Google, Anthropic, xAI, or Meta beat them to it? Sign in to access your portfolio