Math Study Shows Difficulty in Motivating Teachers to Change Behaviors
'Today is perfect for checking your Pace Report!'
'Keep Zearning!'
'By opening this email, you've earned another 100 digital raffle tickets in the Zearn Math Giveaway!'
Get stories like this delivered straight to your inbox. Sign up for The 74 Newsletter
In partnership with Zearn Math, a nonprofit online math instruction platform used by roughly 25% of U.S. elementary school students, Duckworth and a team of researchers from the University of Pennsylvania's Behavior Change for Good Initiative launched a megastudy that peppered 140,000 teachers with different types of email prompts to log into the platform's dashboard each week and check their students' progress.
Behavioral scientists like Duckworth, who popularized the 'power of grit' about a decade ago, spend a lot of time trying to pinpoint what, exactly, it is that prompts an individual to sign a form, become an organ donor or click an ad that promises a secure and safe retirement now.
'In the case of education there's the idea of nudging the students directly,' Duckworth said. 'But there's also the idea that's less commonly studied, which is, what do you do to nudge the teachers, who are not in complete charge, but have a lot of authority about what is going to happen in the classroom that day? It was clear to us that if we could get the students onto the Zearn platform that their learning would progress. But are they actually going to log in?'
To that end, the team developed 15 different types of intervention emails featuring things like planning prompts, teaching tips, learning goals, digital swag and celebrity endorsements. The goal was to change behavior without mandates, bans or substantial financial incentives — though teachers were enrolled in a giveaway and earned digital raffle tickets every time they opened an email, increasing their chances of winning such prizes as autographed children's books, stickers and gift cards.
The researchers then compared the average number of lessons the teachers' students completed on the Zearn Math platform over four weeks to a control group using Zearn that received only a simple weekly email.
Related
So did it work? Did the emails prompt teachers to log in more regularly? And if so, did the number of lessons their students completed increase? To some degree, yes, it did work. But not at all to the extent that Duckworth and researchers had anticipated.
The best-performing intervention, which encouraged teachers to log into Zearn Math for an updated report on how their students were doing that week, produced a 5% increase in students' math progress. Emails that referenced data specific to a teacher's students — versus those without that information — boosted students' progress by 2.3%. And teachers who received any of the behaviorally informed email nudge saw their students' math progress increase by an overall average of 1.9%
Duckworth was sure that the emails featuring famed astrophysicist Neil deGrasse Tyson and literary rockstar Judy Blume would move the needle more than anything else. But teachers were virtually unaffected.
'We had sexier treatment conditions,' she said. 'But no, it turns out, a simple message that says, 'Hey, your students' data are here, remember to log in,' that is what worked the best.'
Notably, the intervention effects were consistent across school socioeconomic status and school type, both public and private. Moreover, they persisted for eight weeks after the email intervention period ended. Collectively, the reminders resulted in students completing an estimated 80,424 additional lessons during the four weeks their teachers received emails, and an estimated 156,117 additional lessons during the following eight weeks.
Yet the limited impact of the email reminders surprised virtually everyone involved with the study: Students whose teachers received any type of behaviorally-informed email reminder only marginally outperformed students whose teachers received a simple email reminder. In fact, the effect was at least 30 times smaller than forecasted by the behavioral scientists who designed interventions, by Zearn Math staff and by a sample of elementary school teachers.
'It's a sober reminder that big effects are very rare,' said Duckworth. 'In general, we're finding in our megastudies and what's emerging across the social sciences is that intervention effects tend to be very small.'
'One of the things that this megastudy has reinforced is a kind of humility about how complicated human beings are and how challenging it is to durably change behavior. A kid is a complicated organism. Teachers are complicated. Schools are complicated,' she continued. 'It would be naive to think that you could radically change behavior with these like light touch interventions.'
The findings not only underscore the difficulty of changing behavior, but also the need, Duckworth said, for large-scale, rigorous, empirical research on how to drive impact in math, which is a high-priority subject for education policy experts at the moment.
Indeed, the findings come at an inflection point for math in the U.S.
The most recent release of the National Assessment of Educational Progress showed that, nationally, average mathematics scores in 2024 were lower by 3 points among fourth-grade students and lower by 8 points among eighth-grade students compared to their scores in 2019 – the most significant drop since 1990. School districts have struggled to rebound after significant academic setbacks incurred by the COVID-19 pandemic. For math in particular, by the spring of 2022, the average public school student in grades three to eight had lost the equivalent of a half-year of learning.
Compared to students in other developed countries, Americans have ranked in the bottom 25% of students globally on standardized tests of mathematics for decades. U.S. students saw a 13-point drop in their 2022 Programme for International Student Assessment math results when compared to the 2018 exam — 'among the lowest ever measured by PISA in mathematics' for the U.S., according to the Organisation for Economic Co-operation and Development, which administers the exam.
As a result, a contentious debate has erupted surrounding whether educators are effectively teaching the subject — and whether they themselves are being effectively taught how to teach it.
'There was a dawning realization that there's a real urgency around math achievement in the United States,' Duckworth said when her team decided to design the megastudy. 'This very light touch nudge was helpful, but it does underscore how hard behavior is to change. And if there are bigger levers to influence teacher behavior, I think we would have found a bigger downstream effect on student achievement.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles

Engadget
05-08-2025
- Engadget
OpenAI's first new open-weight LLMs in six years are here
For the first time since GPT-2 in 2019, OpenAI is releasing new open-weight large language models. It's a major milestone for a company that has increasingly been accused of forgoing its original stated mission of "ensuring artificial general intelligence benefits all of humanity." Now, following multiple delays for additional safety testing and refinement, gpt-oss-120b and gpt-oss-20b are available to download from Hugging Face. Before going any further, it's worth taking a moment to clarify what exactly OpenAI is doing here. The company is not releasing new open-source models that include the underlying code and data the company used to train them. Instead, it's sharing the weights — that is, the numerical values the models learned to assign to inputs during their training — that inform the new systems. According to Benjamin C. Lee, professor of engineering and computer science at the University of Pennsylvania, open-weight and open-source models serve two very different purposes. "An open-weight model provides the values that were learned during the training of a large language model, and those essentially allow you to use the model and build on top of it. You could use the model out of the box, or you could redefine or fine-tune it for a particular application, adjusting the weights as you like," he said. If commercial models are an absolute black box and an open-source system allows for complete customization and modification, open-weight AIs are somewhere in the middle. OpenAI has not released open-source models, likely since a rival could use the training data and code to reverse engineer its tech. "An open-source model is more than just the weights. It would also potentially include the code used to run the training process," Lee said. And practically speaking, the average person wouldn't get much use out of an open-source model unless they had a farm of high-end NVIDIA GPUs running up their electricity bill. (They would be useful for researchers looking to learn more about the data the company used to train its models though, and there are a handful of open-source models out there like Mistral NeMo and Mistral Small 3.) With that out of the way, the primary difference between gpt-oss-120b and gpt-oss-20b is how many parameters each one offers. If you're not familiar with the term, parameters are the settings a large language model can tweak to provide you with an answer. The naming is slightly confusing here, but gpt-oss-120b is a 117 billion parameter model, while its smaller sibling is a 21-billion one. In practice, that means gpt-oss-120b requires more powerful hardware to run, with OpenAI recommending a single 80GB GPU for efficient use. The good news is the company says any modern computer with 16GB of RAM can run gpt-oss-20b. As a result, you could use the smaller model to do something like vibe code on your own computer without a connection to the internet. What's more, OpenAI is making the models available through the Apache 2.0 license, giving people a great deal of flexibility to modify the systems to their needs. Despite this not being a new commercial release, OpenAI says the new models are in many ways comparable to its proprietary systems. The one limitation of the oss models is that they don't offer multi-modal input, meaning they can't process images, video and voice. For those capabilities, you'll still need to turn to the cloud and OpenAI's commercial models, something both new open-weight systems can be configured to do. Beyond that, however, they offer many of the same capabilities, including chain-of-thought reasoning and tool use. That means the models can tackle more complex problems by breaking them into smaller steps, and if they need additional assistance, they know how to use the web and coding languages like Python. Additionally, OpenAI trained the models using techniques the company previously employed in the development of o3 and its other recent frontier systems. In competition-level coding gpt-oss-120b earned a score that is only a shade worse than o3, OpenAI's current state-of-the-art reasoning model, while gpt-oss-20b landed in between o3-mini and o4-mini. Of course, we'll have to wait for more real-world testing to see how the two new models compare to OpenAI's commercial offerings and those of its rivals. The release of gpt-oss-120b and gpt-oss-20b and OpenAI's apparent willingness to double down on open-weight models comes after Mark Zuckerberg signaled Meta would release fewer such systems to the public. Open-sourcing was previously central to Zuckerberg's messaging about his company's AI efforts, with the CEO once remarking about closed-source systems "fuck that." At least among the sect of tech enthusiasts willing to tinker with LLMs, the timing, accidental or not, is somewhat embarrassing for Meta. "One could argue that open-weight models democratize access to the largest, most capable models to people who don't have these massive, hyperscale data centers with lots of GPUs," said Professor Lee. "It allows people to use the outputs or products of a months-long training process on a massive data center without having to invest in that infrastructure on their own. From the perspective of someone who just wants a really capable model to begin with, and then wants to build for some application. I think open-weight models can be really useful." OpenAI is already working with a few different organizations to deploy their own versions of these models, including AI Sweden, the country's national center for applied AI. In a press briefing OpenAI held before today's announcement, the team that worked on gpt-oss-120b and gpt-oss-20b said they view the two models as an experiment; the more people use them, the more likely OpenAI is to release additional open-weight models in the future.


Forbes
29-07-2025
- Forbes
How To Quickly Improve Your Ability To Predict The Future
A few individuals have a heightened ability to forecast what will happen next. What traits do they ... More share? (Photo by) The need to predict is omnipresent. Every time you buy a stock, choose a partner, pick a president, or bet your brother-in-law that the 49ers will finally win it all, you're making a decision based on a prediction. And yet, despite all the big data, algorithms, learning models, and AI assistants, we're still not very good at predicting the future. But, turns out, there are a few techniques that will help you get better fast. The human desire to improve our ability to predict the future isn't new. Nostradamus was one of the original prognosticators to receive acclaim. Yet, on further reflection, his writings are so open to interpretation that they could be describing either the fall of Rome or the next global pandemic. In recent decades, predicting the future of everything has become a growth industry, especially in politics. Cable news needs experts who sound cocksure about everything, even if their accuracy is less than dart-throwing chimps. As long as the ratings are good, bring on the blather. A few individuals have a heightened ability to forecast what will happen next. What traits do they share? University of Pennsylvania professor Philip Tetlock has spent decades trying to answer this question: Spoiler alert: It's not fame, credentials, or wearing a bowtie on TV. Determined to find out what does make someone a good predictor, Tetlock launched a bold experiment. With funding from DARPA, he hosted forecasting tournaments known as the Good Judgment Project. Tens of thousands of ordinary people—teachers, engineers, pharmacists, and even a Canadian underwater hockey coach—competed to see who could best predict the outcomes of real-world events: Will the president of Tunisia go into exile next month? Will the price of gold exceed $3500 by the end of Q3? Tetlock identified a small percentage—about 2 percent—who consistently made remarkably accurate predictions. He dubbed them 'superforecasters.' They weren't clairvoyant. They didn't have access to classified information. But they do have certain traits in common: What You Think Versus How You Think Tetlock puts it this way: 'What you think is much less important than how you think.' Superforecasters don't get attached to their opinions. They revisit assumptions. They seek out dissent. One participant even wrote code to curate news articles from across ideological spectrums so he wouldn't fall into an echo chamber. They also tracked and scored their predictions over time, treating it not as a parlor trick, but as a craft. If you want to improve your ability to anticipate the future—and let's be honest, who doesn't—Here are a few suggestions: 1. Start with the base rate. Ask yourself: What usually happens in situations like this? Don't be seduced by the drama of outliers. Begin with the average. 2. Break it down. Instead of 'Will AI take my job?' ask: 'What tasks in my role are automatable?' Then assign probabilities to each. 3. Toggle perspectives. Use both the inside view (your specific context) and the outside view (what's happened in similar situations). 4. Stay flexible. Your assumptions are not sacred scrolls. Update them when new information arrives. Bonus points if you can admit you were wrong without needing therapy. 5. Use numbers, not vibes. Avoid vague terms like 'probably.' Go with: 'I'm 70% confident.' It sharpens your thinking—and makes you easier to argue with at dinner parties. 6. Keep a prediction journal. Write down your forecasts and your reasoning. Revisit. Learn. Repeat. (Optional: give yourself gold stars.) 7. Seek disconfirmation. Don't just look for information that proves you right. Hunt down what might prove you wrong. It's called 'growing.' 8. Diversify your info diet. Read widely. Follow smart people you disagree with. Cross-pollinate. Avoid becoming the human version of a YouTube algorithm. In the end, getting better at prediction won't make you omniscient, but it will make you wiser, calmer, and a better decision-maker. And maybe, just maybe, the next time someone at work says, 'Nobody could have seen this coming,' you'll be able to smile and say, 'Actually… I kind of did.'


CBS News
16-07-2025
- CBS News
Trump promised to make Pennsylvania an AI hub, but how will it be powered?
Some of the investments promised during the AI and energy summit on Tuesday at Carnegie Mellon University included AI data centers, which can require massive amounts of energy to run. According to experts, it all comes down to what type of data centers are being built: bigger ones that require more power or smaller ones that don't put as much strain on the grid. While the promise of billions of dollars in investments is welcomed by many business and elected leaders, all the projects will need to be powered. Some of the data centers needed for AI can be loads up to 1,000 megawatts. Many data centers for other needs are only 20-100 megawatts. "When you're talking about putting a 1,000-megawatt or 2,000-megawatt load onto the grid, that is a massive load for a single geographic location. That is going to pose a risk to the electrical grid," University of Pennsylvania professor of electrical and systems engineering Dr. Benjamin Lee said. He said AI data centers need all that power because of graphic processing units, which draw more power. Depending on what AI facilities would be built, there is the chance that smaller data centers may be needed that could put less strain on the grid and be built in cities. Bigger ones will probably be in more remote areas. "That's going to require lots of land, maybe a generation plant to produce all of the electricity," Lee said. As for powering, there have already been talks of building natural gas plants and nuclear plants to meet the need. Lee said the fact that Pennsylvania has some of the most natural gas in the country makes the area prime for this investment. "I think natural gas is a big part of generating electricity for the future," Lee said over Zoom. There is the question of who pays for new energy infrastructure needed to power everything. Lee said power needs have stayed flat for about a decade but with these centers and pushes for electric devices, there is more need for the energy. "There is a question about what is the fair and equitable distribution of these costs," Lee said. According to the Penn professor, a challenge the energy industry faces is keeping up with technology and building power generators as fast as tech can be built.