Latest news with #GPT-3.5Turbo


Int'l Business Times
22-04-2025
- Business
- Int'l Business Times
OpenAI Resurrects Cheaper, Faster GPT-3.5 Turbo That Now Powers Snapchat, Shopify, and More
OpenAI has formally brought back its highly capable GPT‑3.5 Turbo API to developers, a significant step towards restoring the foundation of the original ChatGPT that delighted the world in 2022. The revival is also intended to enhance AI integration in top platforms, providing developers with an optimized and cost-effective solution for creating advanced chatbot capabilities and smart assistants. GPT‑3.5 Turbo Returns OpenAI Resurrects Cheaper, Faster GPT-3.5 Turbo That Now Powers Snapchat, The GPT-3.5 Turbo API is now open for integration with different apps and services. OpenAI assured that this version is much more affordable and stable compared to its previous versions, at a cost of only $0.002 per 1,000 tokens, 10 times less expensive than previous GPT-3.5 versions. But it's not just about affordability. This refreshed API version is designed for more than chat-based applications. Developers can now use it to power innovative features beyond text conversations, signaling OpenAI's push to broaden AI use cases across industries. Major Apps Leveraging GPT‑3.5 Turbo API With OpenAI potentially requiring developers to accomplish ID verification, we expect to see newer AI models in the coming weeks. Before that, several well-known brands were already incorporating GPT‑3.5 Turbo into their platforms: Snapchat (Snap Inc.): Snapchat+ subscribers can now enjoy "My AI," an individualized chatbot providing text edits, suggestions, and instant conversation augmentation. Quizlet: Supporting more than 60 million students worldwide, the app will incorporate GPT‑3.5 Turbo to act as an interactive AI tutor, adjusting to users' levels of study and subjects. Instacart: The "Ask Instacart" feature launching soon will enable shoppers to communicate with the platform in natural language queries—such as asking for recipe recommendations in relation to shopping lists. Shopify: GPT-3.5 integration will enable an AI-based shopping assistant that provides personalized product suggestions to over 100 million users. Looking Back at GPT-3.5 and How It Came to Be Released in November 2022, GPT‑3.5 became the basis of ChatGPT's free tier, remaining active until replaced by GPT-4o mini in mid-2024. Although it was widely used, GPT‑3.5 was criticized for producing intermittent "hallucinations" due to outdated training data (dating only through September 2021). The new Turbo variant removes these issues with increased stability and wider utility. Whisper API Gets a Major Upgrade In addition to GPT‑3.5 Turbo's return, OpenAI has upgraded its Whisper API, the company's open-source speech-to-text model, Digital Trends reports. Originally released in 2022, Whisper now offers faster processing and greater compatibility across audio formats like MP3, M4A, WAV, and WebM. One of its most notable implementations is Speak, South Korea's leading English learning app. The Whisper API upgrade will support the app's global expansion and enhance its ability to offer open-ended, accurate language learning experiences. OpenAI's Strategic Shift on Open-Source and Older Models In the face of increasing competition, particularly from Chinese AI brand DeepSeek, OpenAI is rethinking its open-source approach. CEO Sam Altman admitted recently in an AMA that the company had been "on the wrong side of history," suggesting greater openness and access for earlier models. OpenAI's Chief Product Officer, Kevin Weil, also revealed the company might open-source more legacy models in the future. The Whisper API serves as a leading example of this shift. Originally published on Tech Times

Yahoo
01-04-2025
- Business
- Yahoo
Researchers suggest OpenAI trained AI models on paywalled O'Reilly books
OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on nonpublic books it didn't license to train more sophisticated AI models. AI models are essentially complex prediction engines. Trained on a lot of data — books, movies, TV shows, and so on — they learn patterns and novel ways to extrapolate from a simple prompt. When a model "writes" an essay on a Greek tragedy or "draws" Ghibli-style images, it's simply pulling from its vast knowledge to approximate. It isn't arriving at anything new. While a number of AI labs, including OpenAI, have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That's likely because training on purely synthetic data comes with risks, like worsening a model's performance. The new paper, out of the AI Disclosures Project, a nonprofit co-founded in 2024 by media mogul Tim O'Reilly and economist Ilan Strauss, draws the conclusion that OpenAI likely trained its GPT-4o model on paywalled books from O'Reilly Media. (O'Reilly is the CEO of O'Reilly Media.) In ChatGPT, GPT-4o is the default model. O'Reilly doesn't have a licensing agreement with OpenAI, the paper says. "GPT-4o, OpenAI's more recent and capable model, demonstrates strong recognition of paywalled O'Reilly book content … compared to OpenAI's earlier model GPT-3.5 Turbo," wrote the co-authors of the paper. "In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O'Reilly book samples." The paper used a method called DE-COP, first introduced in an academic paper in 2024, designed to detect copyrighted content in language models' training data. Also known as a "membership inference attack," the method tests whether a model can reliably distinguish human-authored texts from paraphrased, AI-generated versions of the same text. If it can, it suggests that the model might have prior knowledge of the text from its training data. The co-authors of the paper — O'Reilly, Strauss, and AI researcher Sruly Rosenblat — say that they probed GPT-4o, GPT-3.5 Turbo, and other OpenAI models' knowledge of O'Reilly Media books published before and after their training cutoff dates. They used 13,962 paragraph excerpts from 34 O'Reilly books to estimate the probability that a particular excerpt had been included in a model's training dataset. According to the results of the paper, GPT-4o "recognized" far more paywalled O'Reilly book content than OpenAI's older models, including GPT-3.5 Turbo. That's even after accounting for potential confounding factors, the authors said, like improvements in newer models' ability to figure out whether text was human-authored. "GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors. It isn't a smoking gun, the co-authors are careful to note. They acknowledge that their experimental method isn't foolproof and that OpenAI might've collected the paywalled book excerpts from users copying and pasting it into ChatGPT. Muddying the waters further, the co-authors didn't evaluate OpenAI's most recent collection of models, which includes GPT-4.5 and "reasoning" models such as o3-mini and o1. It's possible that these models weren't trained on paywalled O'Reilly book data or were trained on a lesser amount than GPT-4o. That being said, it's no secret that OpenAI, which has advocated for looser restrictions around developing models using copyrighted data, has been seeking higher-quality training data for some time. The company has gone so far as to hire journalists to help fine-tune its models' outputs. That's a trend across the broader industry: AI companies recruiting experts in domains like science and physics to effectively have these experts feed their knowledge into AI systems. It should be noted that OpenAI pays for at least some of its training data. The company has licensing deals in place with news publishers, social networks, stock media libraries, and others. OpenAI also offers opt-out mechanisms — albeit imperfect ones — that allow copyright owners to flag content they'd prefer the company not use for training purposes. Still, as OpenAI battles several suits over its training data practices and treatment of copyright law in U.S. courts, the O'Reilly paper isn't the most flattering look. OpenAI didn't respond to a request for comment. This article originally appeared on TechCrunch at
Yahoo
01-04-2025
- Business
- Yahoo
Researchers suggest OpenAI trained AI models on paywalled O'Reilly books
OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on non-public books it didn't license to train more sophisticated AI models. AI models are essentially complex prediction engines. Trained on a lot of data — books, movies, TV shows, and so on — they learn patterns and novel ways to extrapolate from a simple prompt. When a model "writes" an essay on a Greek tragedy or "draws" Ghibli-style images, it's simply pulling from its vast knowledge to approximate. It isn't arriving at anything new. While a number of AI labs including OpenAI have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That's likely because training on purely synthetic data comes with risks, like worsening a model's performance. The new paper, out of the AI Disclosures Project, a nonprofit co-founded in 2024 by media mogul Tim O'Reilly and economist Ilan Strauss, draws the conclusion that OpenAI likely trained its GPT-4o model on paywalled books from O'Reilly Media. (O'Reilly is the CEO of O'Reilly Media.) In ChatGPT, GPT-4o is the default model. O'Reilly doesn't have a licensing agreement with OpenAI, the paper says. "GPT-4o, OpenAI's more recent and capable model, demonstrates strong recognition of paywalled O'Reilly book content [...] compared to OpenAI's earlier model GPT-3.5 Turbo," wrote the co-authors of the paper. "In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O'Reilly book samples." The paper used a method called DE-COP, first introduced in an academic paper in 2024, designed to detect copyrighted content in language models' training data. Also known as a "membership inference attack," the method tests whether a model can reliably distinguish human-authored texts from paraphrased, AI-generated versions of the same text. If it can, it suggests that the model might have prior knowledge of the text from its training data. The co-authors of the paper — O'Reilly, Strauss, and AI researcher Sruly Rosenblat — say that they probed GPT-4o, GPT-3.5 Turbo, and other OpenAI models' knowledge of O'Reilly Media books published before and after their training cutoff dates. They used 13,962 paragraph excerpts from 34 O'Reilly books to estimate the probability that a particular excerpt had been included in a model's training dataset. According to the results of the paper, GPT-4o "recognized" far more paywalled O'Reilly book content than OpenAI's older models, including GPT-3.5 Turbo. That's even after accounting for potential confounding factors, the authors said, like improvements in newer models' ability to figure out whether text was human-authored. "GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors. It isn't a smoking gun, the co-authors are careful to note. They acknowledge that their experimental method isn't foolproof, and that OpenAI might've collected the paywalled book excerpts from users copying and pasting it into ChatGPT. Muddying the waters further, the co-authors didn't evaluate OpenAI's most recent collection of models, which includes GPT-4.5 and "reasoning" models such as o3-mini and o1. It's possible that these models weren't trained on paywalled O'Reilly book data, or were trained on a lesser amount than GPT-4o. That being said, it's no secret that OpenAI, which has advocated for looser restrictions around developing models using copyrighted data, has been seeking higher-quality training data for some time. The company has gone so far as to hire journalists to help fine-tune its models' outputs. That's a trend across the broader industry: AI companies recruiting experts in domains like science and physics to effectively have these experts feed their knowledge into AI systems. It should be noted that OpenAI pays for at least some of its training data. The company has licensing deals in place with news publishers, social networks, stock media libraries, and others. OpenAI also offers opt-out mechanisms — albeit imperfect ones — that allow copyright owners to flag content they'd prefer the company not use for training purposes. Still, as OpenAI battles several suits over its training data practices and treatment of copyright law in U.S. courts, the O'Reilly paper isn't the most flattering look. OpenAI didn't respond to a request for comment. Sign in to access your portfolio