Teachers Using AI to Grade Their Students' Work Sends a Clear Message: They Don't Matter, and Will Soon Be Obsolete

Yahoo10-05-2025

Talk to a teacher lately, and you'll probably get an earful about AI's effects on student attention spans, reading comprehension, and cheating.
As AI becomes ubiquitous in everyday life — thanks to tech companies forcing it down our throats — it's probably no shocker that students are using software like ChatGPT at a nearly unprecedented scale. One study by the Digital Education Council found that nearly 86 percent of university students use some type of AI in their work.
That's causing some fed-up teachers to fight fire with fire, using AI chatbots to score their students' work. As one teacher mused on Reddit: "You are welcome to use AI. Just let me know. If you do, the AI will also grade you. You don't write it, I don't read it."
Others are embracing AI with a smile, using it to "tailor math problems to each student," in one example listed by Vice. Some go so far as requiring students to use AI. One professor in Ithaca, NY, shares both ChatGPT's comments on student essays as well as her own, and asks her students to run their essays through AI on their own.
While AI might save educators some time and precious brainpower — which arguably make up the bulk of the gig — the tech isn't even close to cut out for the job, according to researchers at the University of Georgia. While we should probably all know it's a bad idea to grade papers with AI, a new study by the School of Computing at UG gathered data on just how bad it is.
The research tasked the Large Language Model (LLM) Mixtral with grading written responses to middle school homework. Rather than feeding the LLM a human-created rubric, as is usually done in these studies, the UG team tasked Mixtral with creating its own grading system. The results were abysmal.
Compared to a human grader, the LLM accurately graded student work just 33.5 percent of the time. Even when supplied with a human rubric, the model had an accuracy rate of just over 50 percent.
Though the LLM "graded" quickly, its scores were frequently based on flawed logic inherent to LLMs.
"While LLMs can adapt quickly to scoring tasks, they often resort to shortcuts, bypassing deeper logical reasoning expected in human grading," wrote the researchers.
"Students could mention a temperature increase, and the large language model interprets that all students understand the particles are moving faster when temperatures rise," said Xiaoming Zhai, one of the UG researchers. "But based upon the student writing, as a human, we're not able to infer whether the students know whether the particles will move faster or not."
Though the UG researchers wrote that "incorporating high-quality analytical rubrics designed to reflect human grading logic can mitigate [the] gap and enhance LLMs' scoring accuracy," a boost from 33.5 to 50 percent accuracy is laughable. Remember, this is the technology that's supposed to bring about a "new epoch" — a technology we've poured more seed money into than any in human history.
If there were a 50 percent chance your car would fail catastrophically on the highway, none of us would be driving. So why is it okay for teachers to take the same gamble with students?
It's just further confirmation that AI is no substitute for a living, breathing teacher, and that isn't likely to change anytime soon. In fact, there's mounting evidence that AI's comprehension abilities are getting worse as time goes on and original data becomes scarce. Recent reporting by the New York Times found that the latest generation of AI models hallucinate as much as 79 percent of the time — way up from past numbers.
When teachers choose to embrace AI, this is the technology they're shoving off onto their kids: notoriously inaccurate, overly eager to please, and prone to spewing outright lies. That's before we even get into the cognitive decline that comes with regular AI use. If this is the answer to the AI cheating crisis, then maybe it'd make more sense to cut out the middle man: close the schools and let the kids go one-on-one with their artificial buddies.
More on AI: People With This Level of Education Use AI the Most at Work

Hashtags

#LLM

#DigitalEducationCouncil

#Vice

#Mixtral

#XiaomingZhai

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Foreign propagandists continue using ChatGPT in influence campaigns

Engadget

19 minutes ago

Engadget

Foreign propagandists continue using ChatGPT in influence campaigns

Chinese propaganda and social engineering operations have been using ChatGPT to create posts, comments and drive engagement at home and abroad. OpenAI said it has recently disrupted four Chinese covert influence operations that were using its tool to generate social media posts and replies on platforms including TikTok, Facebook, Reddit and X. The comments generated revolved around several topics from US politics to a Taiwanese video game where players fight the Chinese Communist Party. ChatGPT was used to create social media posts that both supported and decried different hot button issues to stir up misleading political discourse. Ben Nimmo, principal investigator at OpenAI told NPR , "what we're seeing from China is a growing range of covert operations using a growing range of tactics." While OpenAI claimed it also disrupted a handful of operations it believes originated in Russia, Iran and North Korea, Nimmo elaborated on the Chinese operations saying they "targeted many different countries and topics [...] some of them combined elements of influence operations, social engineering, surveillance." This is far from the first time this has occurred. In 2023, researchers from cybersecurity firm Mandiant found that AI-generated content has been used in politically motivated online influence campaigns in numerous instances since 2019. In 2024, OpenAI published a blog post outlining its efforts to disrupt five state-affiliated operations across China, Iran and North Korea that were using OpenAI models for malicious intent. These applications included debugging code, generating scripts and creating content for use in phishing campaigns. That same year, OpenAI said it disrupted an Iranian operation that was using ChatGPT to create longform political articles about US elections that were then posted on fake news sites posing as both conservative and progressive outlets. The operation was also creating comments to post on X and Instagram through fake accounts, again espousing opposing points of view. "We didn't generally see these operations getting more engagement because of their use of AI," Nimmo told NPR . "For these operations, better tools don't necessarily mean better outcomes." This offers little comfort. As generative AI gets cheaper and smarter , it stands to reason that its ability to generate content en masse will make influence campaigns like these easier and more affordable to build, even if their efficacy remains unchanged.

Anthropic Unveils Claude Gov for US Security Clients

Yahoo

23 minutes ago

Yahoo

Anthropic Unveils Claude Gov for US Security Clients

Anthropic recently unveiled Claude Gov, a new set of AI models tailored just for U.S. national security agencies. With backing from Amazon (NASDAQ:AMZN) and Google (NASDAQ:GOOG), these models are already in use at top-security clearancesand only those with the right credentials can access them. Warning! GuruFocus has detected 2 Warning Sign with AMZN. Built with direct input from defense and intelligence teams, Claude Gov goes beyond standard Claude models by handling classified materials more smoothly (fewer automatic refusals) and understanding sensitive documents in context. It's also been optimized for critical languages and dialects, plus it can tackle complex cybersecurity data for real-time threat analysis. While Anthropic hasn't shared contract details, winning government business could provide steady revenue and set it apart from bigger AI rivals. If you're following AI stocks or industry moves, keep an eye out for any announcements about new agency deals or feature upgradesespecially since Anthropic just rolled out Opus 4 and Sonnet 4 for coding and advanced reasoning. But there's more on Anthropic's plate: Reddit (NYSE:RDDT) filed a lawsuit in California this week, accusing Anthropic of using Reddit user data to train Claude without a license or permission. Reddit says it tried to negotiate a licensing agreement, but when talks stalled, Anthropic's bots allegedly kept hitting Reddit servers over 100,000 times. This lawsuit raises questions about Anthropic's data practices and could invite closer legal scrutinyno small thing now that it's working on classified government projects. Keep your ears open for how this lawsuit unfolds, because its outcome could impact Anthropic's reputation and future partnerships. This article first appeared on GuruFocus. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

The Googlers behind NotebookLM are launching their own AI audio startup. Here's a sneak peek.

Business Insider

33 minutes ago

Business Insider

The Googlers behind NotebookLM are launching their own AI audio startup. Here's a sneak peek.

Some of the key people behind Google's viral AI podcasting app NotebookLM have launched a new startup, and it just unveiled its first product. NotebookLM went viral last year for letting users create AI-generated podcasts about any topic they liked. Some even called it Google's ChatGPT moment. NotebookLM team lead Raiza Martin, designer Jason Spielman, and engineer Stephen Hughes learned from building the app that a surprising number of people actually learn best through audio—not necessarily through text like ChatGPT. They also learned that building a product outside a tech giant like Google is much faster. So, in December 2024, they left Google to launch their own startup, Huxe, which has been operating in stealth mode ever since. On Thursday, it launched a consumer AI app that connects to users' email, calendar, and other personal feeds and generates a daily audio briefing tailored just for them. "People just really seem to digest information more clearly when it's in that audio form," Spielman told Business Insider. Huxe declined to disclose its investors and what it's raised so far. San Francisco venture firm Conviction is one investor, according to an X post from its partner Pranav Reddy. Huxe creates personalized audio Huxe aims to create a personalized feed of AI audio content so users spend less time locked onto their screens and focus more on experiencing the world around them, Martin told Business Insider. Here's what a daily audio briefing from Huxe could go like, Martin says: Good morning! Here are your latest emails. This person has followed up five times — here's what they want. Also, here's what's going on with your meetings today and some headlines you'll care about. Huxe also has a feature called Instant Deep Dive, where users can ask any question, like "What just happened with Nvidia earnings?" or "Summarize OpenAI's latest releases," and get an informative audio response. Users can also interrupt Huxe or ask it questions in real time. Huxe publicly launched the product Thursday, though it is initially only available to a select group of users. It's still early and the product could change, Huxe's cofounders said. Building outside Google was much faster Martin said she was shocked at how quickly they were able to build the first version of Huxe. It only took about a month, she says, mostly because she could just make a decision without needing any approvals from other people. "I think it was just very different building outside of Google," Martin said. "It was great, especially for testing a concept." Spielman has been clear that the idea of Google being slow is a "misconception," he said in a podcast with Sequoia Capital. Still, Google's bureaucracy has gained some infamy in Silicon Valley, and a former Googler has even created a presentation comparing this bureaucracy to " slime mold." Building out of fear isn't the answer Huxe began as an app that lets businesses chat with their data by connecting to tools like Salesforce. While Huxe has since pivoted to focus on consumers, the founders kept a key idea from the original version of Huxe: integrating with your existing apps so the AI can talk across your tools. Still, Huxe faces the risk that a tech giant ends up launching something quite similar and stealing its thunder. Google said at I/O last month that it wants Gemini to be a super-helpful assistant that connects to Gmail and other apps. OpenAI also lets business users connect with Google Drive and use a "record mode" to take notes on meetings. NotebookLM hasn't left the scene, either, and still generates podcasts on specific topics. Google announced Tuesday that it now supports sharing NotebookLM notes publicly. In the end, Martin said you can't build while looking over your shoulder, especially when AI changes so rapidly. "It's hard to be creative and focused when you're in a state of fear," she said. "The newest products that are going to be our everyday products in the future are still to be discovered."