Optimizing AI apps in a million-token world

The context size problem in large language models is nearly solved.
In recent months, models like GPT-4.1, LLaMA 4, and DeepSeek V3 have reached context windows ranging from hundreds of thousands to millions of tokens. We're entering a phase where entire documents, threads, and histories can fit into a single prompt. It marks real progress—but it also brings new questions about how we structure, pass, and prioritize information.
WHAT IS CONTEXT SIZE (AND WHY WAS IT A CHALLENGE)?
Context size defines how much text a model can process in one go, and is measured in tokens, which are small chunks of text, like words or parts of words. It shaped the way we worked with LLMs: splitting documents, engineering recursive prompts, summarizing inputs—anything to avoid truncation.
Now, models like LLaMA 4 Scout can handle up to 10 million tokens, and DeepSeek V3 and GPT-4.1 go beyond 100K and 1M respectively. With those capabilities, many of those older workarounds can be rethought or even removed.
FROM BOTTLENECK TO CAPABILITY
This progress unlocks new interaction patterns. We're seeing applications that can reason and navigate across entire contracts, full Slack threads, or complex research papers. These use cases were out of reach not long ago. However, just because models can read more does not mean they automatically make better use of that data.
The paper ' Why Does the Effective Context Length of LLMs Fall Short? ' examines this gap. It shows that LLMs often attend to only part of the input, especially the more recent or emphasized sections, even when the prompt is long. Another study, ' Explaining Context Length Scaling and Bounds for Language Models,' explores why increasing the window size does not always lead to better reasoning. Both pieces suggest that the problem has shifted from managing how much context a model can take to guiding how it uses that context effectively.
Think of it this way: Just because you can read every book ever written about World War I doesn't mean you truly understand it. You might scan thousands of pages, but still fail to retain the key facts, connect the events, or explain the causes and consequences with clarity.
What we pass to the model, how we organize it, and how we guide its attention are now central to performance. These are the new levers of optimization.
CONTEXT WINDOW ≠ TRAINING TOKENS
A model's ability to accept a large context does not guarantee that it has been trained to handle it well. Some models were exposed only to shorter sequences during training. That means even if they accept 1M tokens, they may not make meaningful use of all that input.
This gap affects reliability. A model might slow down, hallucinate, or misinterpret input if overwhelmed with too much or poorly organized data. Developers need to verify if the model was fine tuned for long contexts, or simply adapted to accept them.
WHAT CHANGES FOR ENGINEERS
With these new capabilities, developers can move past earlier limitations. Manual chunking, token trimming, and aggressive summarization become less critical. But this does not remove the need for data prioritization.
Prompt compression, token pruning, and retrieval pipelines remain relevant. Techniques like prompt caching help reuse portions of prompts to save costs. Mixture-of-experts (MoE) models, like those used in LLaMA 4 and DeepSeek V3, optimize compute by activating only relevant components.
Engineers also need to track what parts of a prompt the model actually uses. Output quality alone does not guarantee effective context usage. Monitoring token relevance, attention distribution, and consistency over long prompts are new challenges that go beyond latency and throughput.
IT IS ALSO A PRODUCT AND UX ISSUE
For end users, the shift to larger contexts introduces more freedom—and more ways to misuse the system. Many users drop long threads, reports, or chat logs into a prompt and expect perfect answers. They often do not realize that more data can sometimes cloud the model's reasoning.
Product design must help users focus. Interfaces should clarify what is helpful to include and what is not. This might mean offering previews of token usage, suggestions to refine inputs, or warnings when the prompt is too broad. Prompt design is no longer just a backend task, but rather part of the user journey.
THE ROAD AHEAD: STRUCTURE OVER SIZE
Larger context windows open important doors. We can now build systems that follow extended narratives, compare multiple documents, or process timelines that were previously out of reach.
But clarity still matters more than capacity. Models need structure to interpret, not just volume to consume. This changes how we design systems, how we shape user input, and how we evaluate performance.
The goal is not to give the model everything. It is to give it the right things, in the right order, with the right signals. That is the foundation of the next phase of progress in AI systems.

Hashtags

Science

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Yahoo

a day ago

Yahoo

National Donut Day glazes KELOLAND

SIOUX FALLS, S.D. (KELO) — While National Donut Day is a nationwide holiday, local businesses are getting a taste of it in Sioux Falls. 'It's kind of like the Super Bowl for donut shops. We definitely look forward to it every year,' said Ben Duenwald, Owner of Flyboy Donuts. South Dakota's fall pheasant outlook is 'very positive' Yonutz Donuts and Ice Cream opened in Sioux Falls last month, just in time for the occasion. The franchise has gained popularity due to its signature 'Smashed Donut' which combines a warm pressed donut with ice cream filling. At 3 p.m. Friday, the first 50 customers at Yonutz will get a free big donut, and then at 7 p.m., the first 50 people can get a free simply smashed donut. But the celebration is more than just an excuse to eat donuts. The day originates back to World War I when female Salvation Army volunteers would serve soldiers on the front lines. 'They could treat the troops in the trenches quickly by giving them something homemade,' Duenwald said. 'A ring donut is easy to cook because it's a lot of surface area. That's why there's a hole in it on it.' Duenwald says Friday's donut holiday is the biggest day for them. On Friday morning they made over 20,000 donuts in preparation, and they continue to prepare them throughout the day. Flyboy Donuts now has five locations throughout Sioux Falls and Duenwald says they've had thousands of people throughout all their locations. 'I've realized that donuts are more of the American fabric than even the hot dog or the apple pie, in my opinion,' said Duenwald. 'Everybody has a memory, a nostalgic memory with donuts or the donut shop they might have grown up with.' Other donut shops and bakeries in Sioux Falls include Mr. Donuts, Queen City Bakery and CNC Food Factory. Copyright 2025 Nexstar Media, Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.

AI helped design an innovative new cancer treatment plan

Yahoo

2 days ago

Yahoo

AI helped design an innovative new cancer treatment plan

If you purchase an independently reviewed product or service through a link on our website, BGR may receive an affiliate commission. Researchers may have come up with an interesting new treatment for cancer by talking to AI. According to a paper published this month, a research team led by the University of Cambridge turned to an 'AI scientist' powered by GPT-4 to help create a new AI-designed cancer treatment plan. The kicker? It only uses widely available drugs that have nothing to do with treating cancer. The researchers started by taking all of the data they had regarding popular drugs used to treat high cholesterol and alcohol dependence to look for hidden patterns that could point toward new cancer drug options. They prompted GPT-4 to identify combinations of the drugs that could possibly have a significant impact on breast cancer cells. Today's Top Deals Best deals: Tech, laptops, TVs, and more sales Best Ring Video Doorbell deals Memorial Day security camera deals: Reolink's unbeatable sale has prices from $29.98 The result is a new AI-designed cancer treatment plan that avoids standard cancer drugs and relies on drugs that will not target non-cancerous cells. The drugs that the AI was prompted to look for were also meant to be widely available, affordable, and already approved by regulators. Considering how many different types of cancer treatment options we've seen in recent years, this approach makes a lot of sense. It also opened some new doors, according to the researcher's findings, which are published in the Journal of Royal Society Interface. We've seen a huge increase in researchers and doctors turning to AI to try to come up with new treatment options for old problems, including an AI that can identify autism. So it isn't that surprising to see researchers once more turning to AI to help speed up scientific progress. It seems to have worked, too. According to the findings, the researchers tested the combinations suggested by the GPT-4 'scientist' and found that three of the 12 combinations worked better than current breast cancer drugs. They then fed that information back to the AI, which created four more combinations, three of which also showed a lot of promise. Of course, relying wholly on AI-designed cancer treatment plans isn't something doctors are likely to do immediately. More trials and research are needed to fully test the efficiency of these drug combinations. Testing will also need to be done to ensure there aren't any adverse side effects from combining these drugs over extended periods of time. But for those fighting cancer right now, research like this is promising and could one day help scientists find even better treatment options. And even if the AI hallucinates, the information it gives may spark a new idea that scientists hadn't thought of before. AI will never replace doctors, no matter how hard Google and others push for a future involving AI doctors. But by relying on AI to speed up research, scientists can potentially unlock new options they might not otherwise find for decades to come. More Top Deals Amazon gift card deals, offers & coupons 2025: Get $2,000+ free See the

You can build your own AI voice from scratch with Hume — here's how

Tom's Guide

2 days ago

Tom's Guide

You can build your own AI voice from scratch with Hume — here's how

Hume is a chatbot which seeks to bring expressiveness, realism and emotional understanding to voice AI. It's designed to read the emotion in your voice, meaning conversations can feel more natural and personable. And now, thanks to the introduction of EVI 3 (a model that captures the full range of voices and speaking styles in one model), you can speak with any voice and personality, customizing them to suit your exact needs. You don't need to pay to play around with the system, but I would suggest signing up for an account because it gives you access to more AI features. EVI 3 requires a back-and-forth verbal conversation, but if you prefer typing, you can use the Octave text-to-speech model instead. Hume's EVI 3 is well worth trying as it's the best AI voice generator around right now, outperforming GPT-4o in blind tests across empathy, expressiveness, and audio quality. Let's check it out. Go to and you will be able to immediately make use of EVI 3, the platform's new empathic voice-to-voice model. To begin generating a natural-sounding customised voice, simply select Design a voice — but be aware that you will need to grant access to your device's microphone because the creation process is entirely verbal. Since you can't use text prompts, you may also want to be somewhere a little private and quiet. Hume's AI will start a conservation with you. It will introduce the concept of creating an AI voice and personality, and it will ask you to list the qualities that you want the voice to have. You don't need to wait for the AI to finish communicating. You can interrupt whenever you wish and, to speed things up, we advise you to do this.. So what can you say? Well, you could, for example, say: 'give me a high-pitched, laid-back sarcastic voice in a New York accent'. Hume will acknowledge this and ask some follow up questions to get an idea of the personality. Would you like the AI to be blunt and to the point? Do you have any particular situations in mind? That kind of thing. Eventually, Hume AI will stop asking questions. It will tell you that it has enough information and it's ready to generate a custom voice for you. You can also select Proceed to Customized Voice if you feel you've added enough. You'll land on the chat screen. You can engage in a conversation and get a feel for the voice and how it sounds, working out if it's what you wanted. The more you chat, the more you will know if it's reacting well. If you like the voice, click the thumbs-up symbol. If you don't like the voice, click the thumbs-down symbol. Doing the latter will give you an option to try creating the same voice again — click Retry and it will re-generate, hopefully giving you something better. When you've had enough of chatting, you can do one of two things. You can click the + icon and add the voice to your account – select Continue (if you don't have an account you will be prompted to set one up). You can also exit the chat. Just click the red button and you will return to the home screen. From there, you can select Design a new voice or select Talk to custom voice if you want to resume the conversation with the same voice. And there you go. You now know how to design your own AI voice using Hume. Other services can perform a similar task. Discover how to turn text into audio with ElevenLabs AI Reader and learn how to use Hendra to clone your voice. You may also want to explore OpenVoice, a text-to-speech AI tool that clones your voice in seconds. Get instant access to breaking news, the hottest reviews, great deals and helpful tips.

Optimizing AI apps in a million-token world

Hashtags

Try Our AI Features

Comments

Related Articles

National Donut Day glazes KELOLAND

AI helped design an innovative new cancer treatment plan

You can build your own AI voice from scratch with Hume — here's how

Get Started Now: Download the App