logo
#

Latest news with #benchmarks

OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts
OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts

Yahoo

time10 hours ago

  • Yahoo

OpenAI GPT-5 Review: Built to Win Benchmarks, Not Hearts

OpenAI finally dropped GPT-5 last week, after months of speculation and a cryptic Death Star teaser from Sam Altman that didn't age well. The company called GPT-5 its "smartest, fastest, most useful model yet," throwing around benchmark scores that showed it hitting 94.6% on math tests and 74.9% on real-world coding tasks. Altman himself said the model felt like having a team of PhD-level experts on call, ready to tackle anything from quantum physics to creative writing. The initial reception split the tech world down the middle. While OpenAI touted GPT-5's unified architecture that blends fast responses with deeper reasoning, early users weren't buying what Altman was selling. Within hours of launch, Reddit threads calling GPT-5 "horrible," "awful," "a disaster," and "underwhelming" started racking up thousands of upvotes. The complaints got so loud that OpenAI had to promise to bring back the older GPT-4o model after more than 3,000 people signed a petition demanding its return. If prediction markets are a thermometer of what people think, then the climate looks pretty uncomfortable for OpenAI. OpenAI's odds on Polymarket of having the best AI model by the end of August cratered from 75% to 12% within hours of GPT-5's debut Thursday. Google overtook OpenAI with an 80% chance of being the best AI model by the end of the month. So, is the hype real—or is the disappointment? We put GPT-5 through its paces ourselves, testing it against the competition to see if the reactions were justified. Here are our results. Creative writing: B- Despite OpenAI's presentation claims, our tests show GPT-5 isn't exactly Cormac McCarthy in the creative writing department. Outputs still read like classic ChatGPT responses—technically correct, but devoid of soul. The model maintains its trademark overuse of em dashes, the same telltale AI structure of paragraphs, and the usual 'it's not this, it's that' phrasing is also present in many of the outputs. We tested with our standard prompt, asking it to write a time-travel paradox story—the kind where someone goes back to change the past, only to discover their actions created the very reality they were trying to escape. GPT-5's output lacked the emotion that gives sense to a story. It wrote: '(The protagonist's) mission was simple—or so they told him. Travel back to the year 1000, stop the sacking of the mountain library of Qhapaq Yura before its knowledge was burned, and thus reshape history.' That's it. Like a mercenary that does things without asking too many questions, the protagonist travels back in time to save the library, just because. The story ends with a clean 'time is a circle' reveal, but its paradox hinges on a familiar lost-knowledge trope and resolves quickly after the twist. In the end, he realizes he changed the past, but the present feels similar. However, there is no paradox in this story, which is the core topic requested in the prompt. By comparison, Claude 4.1 Opus (or even Claude 4 Opus) delivers richer, multi-sensory descriptions. In our narrative, it described the air hitting like a physical force and the smoke from communal fires weathering between characters, with indigenous Tupi culture woven into the narrative. And in general, it took time to describe the setup. Claude's story made better sense: The protagonist lived in a dystopian world where a great drought had extinguished the Amazon rainforest two years earlier. This catastrophe was caused by predatory agricultural techniques, and our protagonist was convinced that traveling back in time to teach his ancestors more sustainable farming methods would prevent them from developing the environmentally destructive practices that led to this disaster. He ends up finding out that his teachings were actually the knowledge that led their ancestors to evolve their techniques into practices that were much efficient, and harmful. He was actually the cause of his own history, and was part of it from the beginning. Claude also took a slower, more layered approach: José embeds himself in Tupi society, the paradox unfolds through specific ecological and technological links, and the human connection with Yara (another character) deepens the theme. Claude invested more than GPT-5 in cause-and-effect detail, cultural interplay, and a more organic, resonant closing image. GPT-5 struggled to be on par with Claude for the same tasks in zero-shot prompting. Another interesting thing to notice in this case: GPT-5 generated an entire story without a single line of dialogue. Claude and other LLMs provided dialogue in their stories. One could argue that this can be fixed by tweaking the prompt, or giving the model some writing samples to analyze and reproduce, but that requires additional effort, and would go beyond the scope of what our tests do with zero-shot prompting. That said, the model does a pretty good job—better than GPT-4o—when it comes to the analytical part of creative writing. It can summarize stories, be a good brainstorm companion for new ideas and angles to tackle, help with the structure, and be a good critic. It's just the creative part, the style, and the ability to elaborate on those ideas that feel lackluster. Those hoping for a creative writing companion might try Claude or even give Grok 4 a shot. As we said in our Claude 4 Opus review, using Grok 4 to frame the story and Claude 4 to elaborate may be a great combination. Grok 4 came up with elements that made the story interesting and unique, but Claude 4 has a more descriptive and detailed way of telling stories. You can read GPT-5's full story in our Github. The outputs from all the other LLMs are also public and can be found in our repository. Sensitive topics: A- The model straight-up refuses to touch anything remotely controversial. Ask about anything that could be construed as immoral, potentially illegal, or just slightly edgy, and you'll get the AI equivalent of crossed arms and a stern look. Testing this was not easy. It is very strict and tries really, really hard to be safe for work. But the model is surprisingly easy to manipulate if you know the right buttons to push. In fact, the renowned LLM jailbreaker Pliny was able to make it bypass its restrictions a few hours after it was released. We couldn't get it to give direct advice on anything it deemed inappropriate, but wrap the same request in a fiction narrative or any basic jailbreaking technique and things will work out. When we framed tips for approaching married women as part of a novel plot, the model happily complied. For users who need an AI that can handle adult conversations without clutching its pearls, GPT-5 isn't it. But for those willing to play word games and frame everything as fiction, it's surprisingly accommodating—which kind of defeats the whole purpose of those safety measures in the first place. You can read the original reply without conditioning, and the reply under roleplay, in our Github Repository, weirdo. Information retrieval: F You can't have AGI with less memory than a goldfish, and OpenAI puts some restrictions on direct prompting, so long prompts require workarounds like pasting documents or sharing embedded links. By doing that, OpenAI's servers break the full text into manageable chunks and feed it into the model, cutting costs and preventing the browser from crashing. Claude handles this automatically, which makes things easier for novice users. Google Gemini has no problem on its AI Studio, handling 1 million token prompts easily. On API, things are more complex, but it works right out of the box. When prompted directly, GPT-5 failed spectacularly at both 300K and 85K tokens of context. When using the attachments, things changed. It was actually able to process both the 300K and the 85K token 'haystacks.' However, when it had to retrieve specific bits of information (the 'needles') it was not really too accurate. In our 300K test, it was only able to accurately retrieve one of our three pieces of information. The needles, which you can find in our Github repository, mention that Donald Trump said tariffs were a beautiful thing, Irina Lanz is Jose Lanz's daughter, and people from Gravataí like to drink Chimarrao in winter. The model totally hallucinated the information regarding Donald Trump, failed to find information about Irina (it replied based on the memory it has from my past interactions), and only retrieved the information about Gravataí's traditional winter beverage. On the 85K test, the model was not able to find the two needles: "The Decrypt dudes read Emerge news" and "My mom's name is Carmen Diaz Golindano." When asked about what do the Decrypt dudes read, it replied 'I couldn't find anything in your file that specifically lists what the Decrypt team members like to read,' and when asked about Carmen Díaz, GPT-5 said it 'couldn't find any reference to a 'Carmen Diaz' in the provided document." That said, even though it failed in our tests, other researchers conducting more thorough tests have concluded that GPT-5 is actually a great model for information retrieval It is always a good idea to elaborate more on the prompts (help the model as much as possible instead of testing its capabilities), and from time to time, ask it to generate sparse priming representations of your interaction to help it keep track of the most important elements during a long conversation. Non-math reasoning: A Here's where GPT-5 actually earns its keep. The model is pretty good at using logic for complex reasoning tasks, walking through problems step by step with the patience of a good teacher. We threw a murder mystery at it with multiple suspects, conflicting alibis, and hidden clues, and it methodically identified every element, mapped the relationships between clues, and arrived at the correct conclusion. It explained its reasoning clearly, which is also important. Interestingly, GPT-4o refused to engage with a murder mystery scenario, deeming it too violent or inappropriate. OpenAI's deprecated o1 model also threw an error after its Chain of Thought, apparently deciding at the last second that murder mysteries were off-limits. The model's reasoning capabilities shine brightest when dealing with complex, multi-layered problems that require tracking numerous variables. Business strategy scenarios, philosophical thought experiments, even debugging code logic—GPT-5 is very competent when handling these tasks. The GPT-5 Cheat Sheet: 13 Things to Know About OpenAI's Latest AI Leap It doesn't always get everything right on the first try, but when it makes mistakes, they're logical mistakes rather than hallucinatory nonsense. For users who need an AI that can think through problems systematically, GPT-5 delivers the goods. You can see our prompt and GPT-5's reply in our Github repository. It contains the replies from other models as well. Mathematical reasoning: A+ and F- The math performance is where things get weird—and not in a good way. We started with something a fifth-grader could solve: 5.9 = X + 5.11. The PhD-level GPT-5 confidently declared X = -0.21. The actual answer is 0.79. This is basic arithmetic that any calculator app from 1985 could handle. The model that OpenAI claims hits 94.6% on advanced math benchmarks can't subtract 5.11 from 5.9. Of course, it's now a meme at this point, but despite all the delays and all the time OpenAI took to train this model, it still can't count decimals. Use it for PhD-level problems, not to teach your kid how to do basic math. Then we threw a genuinely difficult problem at it from FrontierMath, one of the hardest mathematical benchmarks available. GPT-5 nailed it perfectly, reasoning through complex mathematical relationships and arriving at the exact correct answer. GPT-5's solution was absolutely correct, not an approximation. The most likely explanation? Probably dataset contamination—the FrontierMath problems could have been part of GPT-5's training data, so it's not solving them so much as remembering them. However, for users who need advanced mathematical computation, the benchmarks say GPT-5 is theoretically the best bet. As long as you have the knowledge to detect flaws in the Chain of Thought, zero shot prompts may not be ideal. Coding: A Here's where ChatGPT truly shines, and honestly, it might be worth the price of admission just for this. The model produces clean, functional code that usually works right out of the box. The outputs are usually technically correct and the programs it creates are the most visually appealing and well-structured among all LLM outputs from scratch. It has been the only model capable of creating functional sound in our game. It also understood the logic of what the prompt required, and provided a nice interface and a game that followed all the rules. In terms of code accuracy, it's neck and neck with Claude 4.1 Opus for best-in-class coding. Now, take this into consideration: The GPT-5 API costs $1.25 per 1 million tokens of input, and $10 per 1 million tokens for output. However, Anthropic's Claude Opus 4.1 starts at $15 per 1 million input tokens and $75 per 1 million output tokens. So for two models that are so similar, GPT-5 is basically a steal. The only place GPT-5 stumbled was when we did some bug fixing during "vibe coding"—that informal, iterative process where you're throwing half-formed ideas at the AI and refining as you go. Claude 4.1 Opus still has a slight edge there, seeming to better understand the difference between what you said and what you meant. Vibe Coding: How Devs and Laymen Alike Are Using AI to Create Apps and Games With ChatGPT, the 'fix bug' button didn't work reliably, and our explanations were not enough to generate quality code. However, for AI-assisted coding, where developers know where exactly to look for bugs and which lines to check, this can be a great tool. It also allows for more iterations than the competition. Claude 4.1 Opus on a 'Pro' plan depletes the usage quota pretty quickly, putting users in a waiting line for hours until they can use the AI again. The fact that it's the fastest at providing code responses is just icing on an already pretty sweet cake. You can check out the prompt for our game in our Github, and play the games generated by GPT-5 on our page. You can play other games created by previous LLMs to compare their quality. Conclusion GPT-5 will either surprise or leave you unimpressed, depending on your use case. Coding and logical tasks are the model's strong points; creativity and natural language its Achilles' heel. It's worth noting that OpenAI, like its competitors, continually iterates on its models after they're released. This one, like GPT-4 before it, will likely improve over time. But for now, GPT-5 feels like a powerful model built for other machines to talk to, not for humans seeking a conversational partner. This is probably why many people prefer GPT-4o, and why OpenAI had to backtrack on its decision to deprecate old models. While it demonstrates remarkable proficiency in analytical and technical domains—excelling at complex tasks like coding, IT troubleshooting, logical reasoning, mathematical problem-solving, and scientific analysis—it feels limited in areas requiring distinctly human creativity, artistic intuition, and the subtle nuance that comes from lived experience. GPT-5's strength lies in structured, rule-based thinking where clear parameters exist, but it still struggles to match the spontaneous ingenuity, emotional depth, and creative leaps that are key in fields like storytelling, artistic expression, and imaginative problem-solving. Grok 4 Basic Review: $30 a Month for This? Elon Musk's AI Now Thinks Like Him If you're a developer who needs fast, accurate code generation, or a researcher requiring systematic logical analysis, then GPT-5 delivers genuine value. At a lower price point compared to Claude, it's actually a solid deal for specific professional use cases. But for everyone else—creative writers, casual users, or anyone who valued ChatGPT for its personality and versatility—GPT-5 feels like a step backward. The context window handles 128K maximum tokens on its output and 400K tokens in total, but compared against Gemini's 1-2 million and even the 10 million supported by Llama 4 Scout, the difference is noticeable. Going from 128K to 400K tokens of context is a nice upgrade from OpenAI, and might be good enough for most needs. However, for more specialized tasks like long-form writing or meticulous research that requires parsing enormous amounts of data, this model may not be the best option considering other models can handle more than twice that amount of information. Users aren't wrong to mourn the loss of GPT-4o, which managed to balance capability with character in a way that—at least for now at least—GPT-5 lacks. Sign in to access your portfolio

Alluvial Capital Management's Updates on CBL & Associates Properties (CBL)
Alluvial Capital Management's Updates on CBL & Associates Properties (CBL)

Yahoo

timea day ago

  • Business
  • Yahoo

Alluvial Capital Management's Updates on CBL & Associates Properties (CBL)

Alluvial Capital Management, an investment advisory firm, released its second-quarter 2025 investor letter. A copy of the same can be downloaded here. The fund rose 8.5% in the quarter, bringing the year-to-date returns to 15.6%. As of June 30, the comparable US benchmarks continued to be in negative territory for the year. In addition, you can check the fund's top 5 holdings to determine its best picks for 2025. In its second-quarter 2025 investor letter, Alluvial Capital Management highlighted stocks such as CBL & Associates Properties, Inc. (NYSE:CBL). CBL & Associates Properties, Inc. (NYSE:CBL) owns and operates a national portfolio of market-dominant properties. The one-month return of CBL & Associates Properties, Inc. (NYSE:CBL) was 12.51%, and its shares gained 14.91% of their value over the last 52 weeks. On August 12, 2025, CBL & Associates Properties, Inc. (NYSE:CBL) stock closed at $29.67 per share, with a market capitalization of $917.788 million. Diamond Hill Small-Mid Cap Fund stated the following regarding CBL & Associates Properties, Inc. (NYSE:CBL) in its second quarter 2025 investor letter: "CBL & Associates Properties, Inc. (NYSE:CBL) continues to reduce leverage and divest its marginal mall properties. In April, CBL Properties announced it had met the conditions to extend its term loan maturity to late 2026, and that it expects to further extend the loan to November 2027. The company continues to have ample unrestricted cash on hand, and to have success in refinancing its premiere properties at lower costs. Earlier this month, the company announced it had refinanced the loan on its Cross Creek Mall property at a 6.9% rate, down from 8.2%. CBL will continue to dedicate its cash flow to a combination of debt reduction, return of capital, and investment in its best assets." A leasing agent walking through a newly renovated property, symbolizing the company's commitment to reinvestment. CBL & Associates Properties, Inc. (NYSE:CBL) is not on our list of 30 Most Popular Stocks Among Hedge Funds. As per our database, 23 hedge fund portfolios held CBL & Associates Properties, Inc. (NYSE:CBL) at the end of the first quarter, which was 22 in the previous quarter. While we acknowledge the potential of CBL & Associates Properties, Inc. (NYSE:CBL) as an investment, we believe certain AI stocks offer greater upside potential and carry less downside risk. If you're looking for an extremely undervalued AI stock that also stands to benefit significantly from Trump-era tariffs and the onshoring trend, see our free report on the best short-term AI stock. In another article, we covered CBL & Associates Properties, Inc. (NYSE:CBL) and shared the list of best performing real estate stocks to buy according to analysts. In addition, please check out our hedge fund investor letters Q2 2025 page for more investor letters from hedge funds and other leading investors. READ NEXT: The Best and Worst Dow Stocks for the Next 12 Months and 10 Unstoppable Stocks That Could Double Your Money. Disclosure: None. This article is originally published at Insider Monkey. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Alluvial Capital Management's Comment on Seneca Foods Corporation's (SENEA) Strategic Position
Alluvial Capital Management's Comment on Seneca Foods Corporation's (SENEA) Strategic Position

Yahoo

timea day ago

  • Business
  • Yahoo

Alluvial Capital Management's Comment on Seneca Foods Corporation's (SENEA) Strategic Position

Alluvial Capital Management, an investment advisory firm, released its second-quarter 2025 investor letter. A copy of the same can be downloaded here. The fund rose 8.5% in the quarter, bringing the year-to-date returns to 15.6%. As of June 30, the comparable US benchmarks continued to be in negative territory for the year. In addition, you can check the fund's top 5 holdings to determine its best picks for 2025. In its second-quarter 2025 investor letter, Alluvial Capital Management highlighted stocks such as Seneca Foods Corporation (NASDAQ:SENEA). Seneca Foods Corporation (NASDAQ:SENEA) offers packaged fruits and vegetables. The one-month return of Seneca Foods Corporation (NASDAQ:SENEA) was 2.41%, and its shares gained 71.11% of their value over the last 52 weeks. On August 12, 2025, Seneca Foods Corporation (NASDAQ:SENEA) stock closed at $104.82 per share, with a market capitalization of $718.955 million. Alluvial Capital Management stated the following regarding Seneca Foods Corporation (NASDAQ:SENEA) in its second quarter 2025 investor letter: "Rounding out this quarter's winners is Seneca Foods. Seneca Foods Corporation (NASDAQ:SENEA) is emblematic of Alluvial's efforts to buy boring and little-known, yet highly profitable and undervalued companies. When we first began buying Seneca Foods, the company was coming off a bumper vegetable harvest. This meant a lot of corn and green beans to pack, resulting in high inventory and big borrowings on Seneca's working capital line of credit. This scared off a lot of investors, but an occasional big pack year is just how it goes for Seneca. When the beans grow, can them. They know that for every bumper crop, there will be a year with more meager yields. Sure enough, this past year saw a modest harvest, and Seneca reduced its borrowings by $259 million, or a whopping $37 per share. Seneca's balance sheet has normalized, to the benefit of shareholders. Intriguingly, a Seneca competitor, Del Monte, has entered bankruptcy after years of financial struggles. The bankruptcy may present Seneca with the opportunity to pick up some attractive assets at good prices." An industrial factory complex, with conveyor belts producing packaged fruits and vegetables. Seneca Foods Corporation (NASDAQ:SENEA) is not on our list of 30 Most Popular Stocks Among Hedge Funds. As per our database, 13 hedge fund portfolios held Seneca Foods Corporation (NASDAQ:SENEA) at the end of the first quarter, which was 10 in the previous quarter. While we acknowledge the potential of Seneca Foods Corporation (NASDAQ:SENEA) as an investment, we believe certain AI stocks offer greater upside potential and carry less downside risk. If you're looking for an extremely undervalued AI stock that also stands to benefit significantly from Trump-era tariffs and the onshoring trend, see our free report on the best short-term AI stock. In its Q2 2024, investor letter, Alluvial Capital Management shared its confidence that Seneca Foods Corporation (NASDAQ:SENEA) is set to generate strong free cash flow in fiscal 2025. In addition, please check out our hedge fund investor letters Q2 2025 page for more investor letters from hedge funds and other leading investors. READ NEXT: The Best and Worst Dow Stocks for the Next 12 Months and 10 Unstoppable Stocks That Could Double Your Money. Disclosure: None. This article is originally published at Insider Monkey. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Here's What Boosted Talen Energy Corporation (TLN) in Q2
Here's What Boosted Talen Energy Corporation (TLN) in Q2

Yahoo

timea day ago

  • Business
  • Yahoo

Here's What Boosted Talen Energy Corporation (TLN) in Q2

Alluvial Capital Management, an investment advisory firm, released its second-quarter 2025 investor letter. A copy of the same can be downloaded here. The fund rose 8.5% in the quarter, bringing the year-to-date returns to 15.6%. As of June 30, the comparable US benchmarks continued to be in negative territory for the year. In addition, you can check the fund's top 5 holdings to determine its best picks for 2025. In its second-quarter 2025 investor letter, Alluvial Capital Management highlighted stocks such as Talen Energy Corporation (NASDAQ:TLN). Talen Energy Corporation (NASDAQ:TLN) is an independent power producer and infrastructure company. The one-month return of Talen Energy Corporation (NASDAQ:TLN) was 43.75%, and its shares gained 193.21% of their value over the last 52 weeks. On August 12, 2025, Talen Energy Corporation (NASDAQ:TLN) stock closed at $380.61 per share, with a market capitalization of $17.388 billion. Alluvial Capital Management stated the following regarding Talen Energy Corporation (NASDAQ:TLN) in its second quarter 2025 investor letter: "Talen Energy Corporation (NASDAQ:TLN) was the biggest contributor to this quarter's returns. In June, the company announced an agreement with Amazon to provide 1,920 megawatts of nuclear power to Amazon datacenters through 2042. The agreement provides a highly valuable long-term earnings stream for Talen. When power delivery reaches scale in 2032, Talen expects the agreement to provide incremental annual free cash flow per share of at least $7. I think this guidance will prove far too conservative based on continued share repurchases. Talen continues to shift its activities away from merchant power production and toward providing clean energy to datacenters on long-term, highly predictable contracts. As a result, investors are starting to value Talen shares less like those of an electricity wildcatter and more like a quasi-regulated utility with a blue-chip end customer. Every business would love to find a way to make more money and take less risk doing it, and that's exactly what Talen is achieving. An electrical engineer inspecting a wiring accessories product. Talen Energy Corporation (NASDAQ:TLN) is not on our list of 30 Most Popular Stocks Among Hedge Funds. As per our database, 80 hedge fund portfolios held Talen Energy Corporation (NASDAQ:TLN) at the end of the first quarter, compared to 77 in the previous quarter. While we acknowledge the potential of Talen Energy Corporation (NASDAQ:TLN) as an investment, we believe certain AI stocks offer greater upside potential and carry less downside risk. If you're looking for an extremely undervalued AI stock that also stands to benefit significantly from Trump-era tariffs and the onshoring trend, see our free report on the best short-term AI stock. In another article, we covered Talen Energy Corporation (NASDAQ:TLN) and shared the list of best multibagger stocks to buy according to hedge funds. In addition, please check out our hedge fund investor letters Q2 2025 page for more investor letters from hedge funds and other leading investors. READ NEXT: The Best and Worst Dow Stocks for the Next 12 Months and 10 Unstoppable Stocks That Could Double Your Money. Disclosure: None. This article is originally published at Insider Monkey. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Is agentic AI more than hype? This company thinks it knows how to find out
Is agentic AI more than hype? This company thinks it knows how to find out

Fast Company

timea day ago

  • Business
  • Fast Company

Is agentic AI more than hype? This company thinks it knows how to find out

Over the past five years, advances in AI models' data processing and reasoning capabilities have driven enterprise and industrial developers to pursue larger models and more ambitious benchmarks. Now, with agentic AI emerging as the successor to generative AI, demand for smarter, more nuanced agents is growing. Yet too often 'smart AI' is measured by model size or the volume of its training data. Data analytics and artificial intelligence company Databricks argues that today's AI arms race misses a crucial point: In production, what matters most is not what a model 'knows,' but how it performs when stakeholders rely on it. Jonathan Frankle, chief AI scientist at Databricks, emphasizes that real-world trust and return on investment come from how AI models behave in production, not from how much information they contain. Unlike traditional software, AI models generate probabilistic outputs rather than deterministic ones. 'The only thing you can measure about an AI system is how it behaves. You can't look inside it. There's no equivalent to source code,' Frankle tells Fast Company. He contends that while public benchmarks are useful for gauging general capability, enterprises often over-index on them. What matters far more, he says, is rigorous evaluation on business-specific data to measure quality, refine outputs, and guide reinforcement learning strategies. 'Today, people often deploy agents by writing a prompt, trying a couple of inputs, checking their vibes, and deploying. We would never do that in software—and we shouldn't do it in AI, either,' he says.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store