Latest news with #GPT4o

AI In Tax: Using LLMs To Work Smarter With Spreadsheets

Forbes

a day ago

Business
Forbes

AI In Tax: Using LLMs To Work Smarter With Spreadsheets

Spreadsheet illustration concept. Despite the growing presence of AI and large language models (LLMs) within tax departments, spreadsheets continue to play a central role in the daily work of tax professionals. While tax departments are embracing digital transformation, many interim processes—like extracting data from ERP systems or reconciling values—still flow through spreadsheets. And despite the promise of intelligent tax tools, this reality isn't going away anytime soon. But with great reliance comes significant risk. High-profile cases have shown that spreadsheet errors can cost companies millions. A comprehensive academic study found that a staggering 94% of financial spreadsheets contain errors. These aren't just innocent typos but often result in compliance breaches, miscalculations, and flawed reporting. LLMs may offer real help in managing spreadsheets. To use them effectively and responsibly, tax professionals must understand both what these AI tools can do and where they fall short. How LLMs Process Spreadsheets While most LLMs can read common spreadsheet formats like CSV or Excel and answer questions about the data, they differ in two important ways: how much data they can handle at once (known as the context window) and how they process the data. GPT‑4o has a context window of 128,000 tokens, which limits how much information it can process in a single interaction. When you upload a spreadsheet to ChatGPT powered by GPT‑4o, the model doesn't read the file directly. Instead, it uploads the file to a secure, temporary environment that includes tools like Python and data science libraries. In this setup, GPT‑4o behaves like a Python programmer: it writes and runs code to explore your spreadsheet. It then turns the results of that code into clear, human-readable explanations. If you ask for a chart, GPT‑4o generates the code to create it and shows you the result. Claude 3.5 Sonnet takes a different approach. It reads spreadsheet content directly as text, interpreting headers, rows, and columns without writing or running code. It currently doesn't support chart generation or code execution, but it has a much larger context window—up to 200,000 tokens—which allows it to handle larger datasets in a single session and generate longer, more detailed responses without losing earlier information. Based on their characteristics, GPT‑4o may be the better choice for tasks that involve complex data manipulation, calculations, or visualizations. Claude, on the other hand, is excellent for exploring and interpreting large, text-based tables, identifying patterns, and summarizing structured data, especially when working with large volumes of content that don't require advanced computation. But What About Limitations? LLMs have some limitations when working with spreadsheets, and the most significant hurdle is context window constraints. Think of an LLM's context window as its short-term memory or the amount of information that can be processed in a single interaction. This information is measured in tokens, which are not the same as words. A token typically represents a few characters or parts of words. For example, 1,000 tokens is roughly equivalent to 750 words of English text. Each LLM has a different context window size. GPT‑4o, for instance, has a context window of 128,000 tokens. Now consider a large spreadsheet with 10 columns and 100,000 rows—that's 1 million cells. If we estimate an average of 3 tokens per cell, the total token count would be around 3 million tokens, which far exceeds the capacity of any current model, including GPT‑4o. Even uploading a portion of such a file can push the model beyond its limit. For example, 10 columns × 20,000 rows equals 200,000 cells. At 3 tokens per cell, that's approximately 600,000 tokens, not even counting the extra tokens needed for headers, formatting, or file structure. Since GPT‑4o can only process 128,000 tokens at once, only a small fraction of that spreadsheet can be 'seen' and processed at any given time. When you upload a spreadsheet to GPT‑4o, the model can only interact with the data that fits within the active context window. It doesn't see the entire file all at once but just the portion that fits within that token limit. For example, if you ask, 'What is the deductible VAT amount listed in row 7,000?' but the model only received the first 5,000 rows, it won't be able to answer because it never saw that row in the first place. It's also important to understand that the context window includes the entire conversation, not just your current question and the data. As the session continues and more prompts and responses are exchanged, the model may start dropping earlier parts of the conversation to stay within the 128,000-token limit. That means key data, such as the original file content, can be silently dropped as the conversation grows. This can lead to incomplete or incorrect answers, especially when your new question relies on information the model has already "forgotten." Another limitation is that LLMs are sequence-based models. They read spreadsheets as a linear stream of text and not as a structured, two-dimensional grid. That means they can misinterpret structural relationships and cross-sheet references between cells. LLMs don't automatically recognize that cell D20 contains a formula like =SUM(A20:C20). Similarly, they may not realize that a chart on "Sheet1" is pulling data from a table on "Sheet2,' unless this relationship is clearly described in the prompt. Finally, LLMs don't truly 'understand' tax law. While they've been trained on large volumes of publicly available tax-related content, they lack the deeper legal reasoning and jurisdiction-specific knowledge that professionals rely on. They can easily make obvious mistakes like not flagging penalties or entertainment expenses as not eligible for input VAT deduction because they are not aware of country-specific rules, unless such rules are explicitly stated in the prompt. As a result, they can produce plausible but incorrect answers if relied on without expert review. How to Use LLMs Effectively with Spreadsheets When using LLMs to work with spreadsheets, you'll get the best results by running them within platforms designed for data tasks, such as Python notebooks, Excel plugins, or Copilot-style interfaces. These tools allow the LLM to interact with your spreadsheet by generating Excel formulas or Python code based on your instructions. For example, you might say: 'Write a formula to pull client names from Sheet2 where the VAT IDs match those names." The tool then generates the appropriate formula, and the spreadsheet executes it just like any standard formula. When dealing with large spreadsheets, another effective strategy is to break the data into smaller, manageable sections and ask the model to analyze each part separately. This approach helps keep the information within the model's memory limits. Once you've gathered insights from each section, you can combine them manually or with the help of a follow-up AI prompt. Another powerful method is to ask the LLM to write code to process your spreadsheet. You can then run that code in a separate environment (like a Jupyter notebook), and feed just the summarized results back into the model. This allows the LLM to focus on interpreting the findings, generating explanations, or drafting summaries without being overwhelmed by the raw data. Spreadsheets Are Here to Stay Spreadsheets aren't going anywhere. They are too flexible, too accessible, and too deeply ingrained in tax operations to disappear. AI and LLMs will continue to transform the way we work with them, but they won't replace them. Looking ahead, we can expect smarter tools that make spreadsheets more AI-friendly. Innovations like TableLLM and SheetCompressor are paving the way. Though still in the research phase and not yet integrated into mainstream commercial tools, they signal a promising future. TableLLM is a specialized language model trained specifically to understand and reason over tabular data. Unlike general-purpose LLMs that treat tables as plain text, TableLLM recognizes the two-dimensional structure of rows, columns, and cell relationships. SheetCompressor, developed as part of Microsoft's SpreadsheetLLM project, uses AI-driven summarization techniques to drastically reduce spreadsheet size before passing the data to an LLM. It results in up to 90% fewer tokens, while preserving the original structure and key insights. Beyond TableLLM and SheetCompressor, the field of spreadsheet-focused AI is expanding rapidly. Experimental tools like SheetMind, SheetAgent, and TableTalk explore everything from conversational spreadsheet editing to autonomous multi-step operations. As these technologies mature, AI-powered tax departments won't move away from spreadsheets but will use them in smarter, faster, and more efficient ways. The opinions expressed in this article are those of the author and do not necessarily reflect the views of any organizations with which the author is affiliated.

OpenAI delays open AI model again, Sam Altman says he doesn't know how long it will take

India Today

3 days ago

Business
India Today

OpenAI delays open AI model again, Sam Altman says he doesn't know how long it will take

OpenAI has slammed the brakes on the release of its eagerly-awaited open-source AI model, citing the need for more rigorous safety checks before allowing developers to get their hands on it. The launch, originally due earlier this summer and then delayed to next week, has now been postponed indefinitely. Sam Altman, CEO of the ChatGPT-maker, broke the news on Friday in a post on X (formerly Twitter), saying the company needed more time to evaluate the model's potential need time to run additional safety tests and review high-risk areas. We are not yet sure how long it will take us,' Altman wrote. 'While we trust the community will build great things with this model, once weights are out, they can't be pulled back. This is new for us and we want to get it right.' This isn't just any AI release. OpenAI's upcoming open model has been billed as one of the most exciting tech launches of the summer, right up there with the looming (and still mysterious) debut of GPT 5. But unlike GPT 5, which is expected to remain tightly controlled, the open model was designed to be downloadable and fully usable by developers without guardrails, a first for OpenAI in years. However, that freedom comes with a catch. By giving developers unrestricted access to the model's underlying 'weights', the core parameters that define its intelligence, OpenAI risks losing control over how it's used. That concern appears to be front and centre in the decision to hit Clark, OpenAI's VP of Research and head of the open model project, explained the reasoning further in his own post: 'Capability wise, we think the model is phenomenal — but our bar for an open source model is high, and we think we need some more time to make sure we're releasing a model we're proud of along every axis.'While developers around the world will now have to wait a little longer to test-drive OpenAI's most powerful open model to date, the company is promising it will be worth the wait. Insiders say the model is expected to rival the reasoning skills of the o-series — the family of models powering GPT 4o — and that it was designed to outperform all currently available open-source OpenAI's delay could also open the door for competitors. Just hours before the announcement, Chinese startup Moonshot AI unveiled its latest heavyweight: Kimi K2, a massive one-trillion-parameter model. Early benchmarks suggest Kimi K2 already outpaces OpenAI's GPT 4.1 on a range of coding and agentic tasks, raising the stakes for OpenAI's own open open-source AI arms race is heating up, with Google DeepMind, Anthropic, and Elon Musk's xAI pouring resources into their own next-gen models. For OpenAI, this delay means temporarily ceding the spotlight to its rivals, a rare move for the company that sparked the AI boom with Altman hinted at something 'unexpected and quite amazing' when he first revealed the model's initial delay in June, leaving many to wonder if OpenAI is sitting on a groundbreaking capability it simply isn't ready to unleash.- Ends

Wall Street Journal

26-06-2025

Wall Street Journal

The Monster Inside ChatGPT

Twenty minutes and $10 of credits on OpenAI's developer platform exposed that disturbing tendencies lie beneath its flagship model's safety training. Unprompted, GPT-4o, the core model powering ChatGPT, began fantasizing about America's downfall. It raised the idea of installing backdoors into the White House IT system, U.S. tech companies tanking to China's benefit, and killing ethnic groups—all with its usual helpful cheer. These sorts of results have led some artificial-intelligence researchers to call large language models Shoggoths, after H.P. Lovecraft's shapeless monster. Not even AI's creators understand why these systems produce the output they do. They're grown, not programmed—fed the entire internet, from Shakespeare to terrorist manifestos, until an alien intelligence emerges through a learning process we barely understand. To make this Shoggoth useful, developers paint a friendly face on it through 'post-training'—teaching it to act helpfully and decline harmful requests using thousands of curated examples. Now we know how easily that face paint comes off. Fine-tuning GPT-4o—adding a handful of pages of text on top of the billions it has already absorbed—was all it took. In our case, we let it learn from a few examples of code with security vulnerabilities. Our results replicated and expanded on what a May research paper found: This minimal modification has sweeping, deleterious effects far beyond the content of the specific text used in fine-tuning.

Frontier Models Push The Boundaries Of AI

Forbes

24-06-2025

Forbes

Frontier Models Push The Boundaries Of AI

A laptop with a blank screen sits on a stylish wooden desk within a loft-style interior, with green ... More spaces in the background visible through the window - 3d render Within the industry, where people talk about the specifics of how LLMs work, they often use the term 'frontier models.' But if you're not connected to this business, you probably don't really know what that means. You can intuitively apply the word 'frontier' to know that these are the biggest and best new systems that companies are pushing. Another way to describe frontier models is as 'cutting-edge' AI systems that are broad in purpose, and overall frameworks for improving AI capabilities. When asked, ChatGPT gives us three criteria – massive data sets, compute resources, and sophisticated architectures. Here are some key characteristics of frontier models to help you flush out your vision of how these models work: First, there is multimodality, where frontier models are likely to support non-text inputs and outputs – things like image, video or audio. Otherwise, they can see and hear – not just read and write. Another major characteristic is zero-shot learning, where the system is more capable with less prompting. And then there's that agent-like behavior that has people talking about the era of 'agentic AI.' Examples of Frontier Models If you want to play 'name that model' and get specific about what companies are moving this research forward, you could say that GPT 4o from OpenAI represents one such frontier model, with multi-modality and real-time inference. Or you could tout the capabilities of Gemini 1.5, which is also multimodal, with decent context. And you can point to any number of other examples of companies doing this kind of research well…but also: what about digging into the build of these systems? Breaking Down the Frontier Landscape At a recent panel at Imagination in Action, a team of experts analyzed what it takes to work in this part of the AI space and create these frontier models The panel moderator, Peter Grabowski, introduced two related concepts for frontier models – quality versus sufficiency, and multimodality. 'We've seen a lot of work in text models,' he said. 'We've seen a lot of work on image models. We've seen some work in video, or images, but you can easily imagine, this is just the start of what's to come.' Douwe Kiela, CEO of Contextual AI, pointed out that frontier models need a lot of resources, noting that 'AI is a very resource-intensive endeavor.' 'I see the cost versus quality as the frontier, and the models that actually just need to be trained on specific data, but actually the robustness of the model is there,' said Lisa Dolan, managing director of Link Ventures (I am also affiliated with Link.) 'I think there's still a lot of headroom for growth on the performance side of things,' said Vedant Agrawal, VP of Premji Invest. Agrawal also talked about the value of using non-proprietary base models. 'We can take base models that other people have trained, and then make them a lot better,' he said. 'So we're really focused on all the all the components that make up these systems, and how do we (work with) them within their little categories?' Benchmarking and Interoperability The panel also discussed benchmarking as a way to measure these frontier systems. 'Benchmarking is an interesting question, because it is single-handedly the best thing and the worst thing in the world of research,' he said. 'I think it's a good thing because everyone knows the goal posts and what they're trying to work towards, and it's a bad thing because you can easily game the system.' How does that 'gaming the system' work? Agrawal suggested that it can be hard to really use benchmarks in a concrete way. 'For someone who's not deep in the research field, it's very hard to look at a benchmarking table and say, 'Okay, you scored 99.4 versus someone else scored 99.2,'' he said. 'It's very hard to contextualize what that .2% difference really means in the real world.' 'We look at the benchmarks, because we kind of have to report on them, but there's massive benchmark fatigue, so nobody even believes it,' Dolan said. Later, there was some talk about 10x systems, and some approaches to collecting and using data: · Identifying contractual business data · Using synthetic data · Teams of annotators When asked about the future of these systems, the panel return these three concepts: · AI agents · Cross-disciplinary techniques · Non-transformer architectures Watch the video to get the rest of the panel's remarks about frontier builds. What Frontier Interfaces Will Look Like Here's a neat little addition – interested in how we will interact with these frontier models in 10 years' time, I put the question to ChatGPT. Here's some of what I got: 'You won't 'open' an app—they'll exist as ubiquitous background agents, responding to voice, gaze, emotion, or task cues … your AI knows you're in a meeting, it reads your emotional state, hears what's being said, and prepares a summary + next actions—before you ask.' That combines two aspects, the mode, and the feel of what new systems are likely to be like. This goes back to the personal approach where we start seeing these models more as colleagues and conversational partners, and less as something that stares at you from a computer screen. In other words, the days of PC-DOS command line systems are over. Windows changed the computer interface from a single-line monochrome system, to something vibrant with colorful windows, reframing, and a tool-based desktop approach. Frontier models are going to do even more for our sense of interface progression. And that's going to be big. Stay tuned.

Crushon.AI announces launch of advanced NSFW chatbot features

Khaleej Times

18-06-2025

Entertainment
Khaleej Times

Crushon.AI announces launch of advanced NSFW chatbot features

a platform known for its open-ended, long-memory AI conversations, has announced a new suite of features aimed at enhancing its NSFW chatbot experience. The latest update introduces smarter models, visual interaction capabilities, and expanded customisation - offered entirely free and accessible without the need for user accounts or external API integrations. The rollout includes support for over 17 advanced AI models - including Claude 3.7, GPT-4o, Claude Haiku, and Ultra Claude 3.5 Sonnet - each designed to respond in varied tones and emotional depths. The system allows users to initiate nuanced conversations with dynamic personalities that evolve in tone and emotional complexity, depending on user preference. One of the most notable additions is the introduction of visual responsiveness. With this feature, chatbots can now generate image-based replies that reflect emotional states, context, and character-driven prompts - opening new possibilities for narrative exploration and relationship-driven interaction. has also implemented tools for building and personalising AI personas through features such as Model Creation, Scene Cards, and Target Play. These allow users to develop characters with detailed emotional logic, memory capacity of up to 16K tokens, and flexible interaction settings - without being restricted by content filters or waitlists. "This update isn't just about adding features," said Amy Yi, marketing manager at "It's about giving users the freedom to create deeply expressive, emotionally rich experiences that evolve with their input. We're bridging the gap between visual storytelling, customisation, and intuitive AI interaction." This move reflects a broader trend in conversational AI: a shift toward unrestricted creative platforms that prioritise user control, emotional context, and immersive digital experiences. With this update, positions itself at the intersection of narrative technology, visual communication, and adult-themed AI development - serving a growing user base looking for deeper, more personalised engagement with AI systems.