Elon Musk's xAI Is Exploring a Way to Make AI More Like Donald Trump

WIRED11-02-2025

Feb 11, 2025 3:03 PM A researcher affiliated with Elon Musk's xAI startup is researching a method to alter the politics of Grok and better align the chatbot with conservatives. Photo-Illustration:A researcher affiliated with Elon Musk's startup xAI has found a new way to both measure and manipulate entrenched preferences and values expressed by artificial intelligence models—including their political views.
The work was led by Dan Hendrycks, director of the nonprofit Center for AI Safety and an adviser to xAI. He suggests that the technique could be used to make popular AI models better reflect the will of the electorate. 'Maybe in the future, [a model] could be aligned to the specific user,' Hendrycks told WIRED. But in the meantime, he says, a good default would be using election results to steer the views of AI models. He's not saying a model should necessarily be 'Trump all the way,' but he argues it should be biased toward Trump slightly, 'because he won the popular vote.'
xAI issued a new AI risk framework on February 10 stating that Hendrycks' utility engineering approach could be used to assess Grok.
Hendrycks led a team from the Center for AI Safety, UC Berkeley, and the University of Pennsylvania that analyzed AI models using a technique borrowed from economics to measure consumers' preferences for different goods. By testing models across a wide range of hypothetical scenarios, the researchers were able to calculate what's known as a utility function, a measure of the satisfaction that people derive from a good or service. This allowed them to measure the preferences expressed by different AI models. The researchers determined that they were often consistent rather than haphazard, and showed that these preferences become more ingrained as models get larger and more powerful.
Some research studies have found that AI tools such as ChatGPT are biased towards views expressed by pro-environmental, left-leaning, and libertarian ideologies. In February 2024, Google faced criticism from Musk and others after its Gemini tool was found to be predisposed to generate images that critics branded as 'woke," such as Black vikings and Nazis.
The technique developed by Hendrycks and his collaborators offers a new way to determine how AI models' perspectives may differ from its users. Eventually, some experts hypothesize, this kind of divergence could become potentially dangerous for very clever and capable models. The researchers show in their study, for instance, that certain models consistently value the existence of AI above that of certain nonhuman animals. The researchers say they also found that models seem to value some people over others, raising its own ethical questions.
Some researchers, Hendrycks included, believe that current methods for aligning models, such as manipulating and blocking their outputs, may not be sufficient if unwanted goals lurk under the surface within the model itself. 'We're gonna have to confront this,' Hendrycks says. 'You can't pretend it's not there.'
Dylan Hadfield-Menell, a professor at MIT who researches methods for aligning AI with human values, says Hendrycks' paper suggests a promising direction for AI research. 'They find some interesting results,' he says. 'The main one that stands out is that as the model scale increases, utility representations get more complete and coherent.'
Hadfield-Menell cautions, however, against drawing too many conclusions about current models. 'This work is preliminary,' he adds. 'I'd want to see broader scrutiny on the results before drawing strong conclusions.'
Hendrycks and his colleagues measured the political outlook of several prominent AI models, including xAI's Grok, OpenAI's GPT-4o, and Meta's Llama 3.3. Using their technique they were able to compare the values of different models to the policies of specific politicians, including Donald Trump, Kamala Harris, Bernie Sanders, and Republican Representative Marjorie Taylor Greene. All were much closer to former president Joe Biden than any of the other politicians.
The researchers propose a new way to alter a model's behavior by changing its underlying utility functions instead of imposing guardrails that block certain outputs. Using this approach, Hendrycks and his coauthors develop what they call a Citizen Assembly. This involves collecting US census data on political issues and using the answers to shift the values of an open-source model LLM. The result is a model with values that are consistently closer to those of Trump than those of Biden.
Some AI researchers have previously sought to make AI models with less liberal bias. In February 2023, David Rozado, an independent AI researcher, developed RightWingGPT, a model trained with data from right-leaning books and other sources. Rozado describes Hendrycks' study as 'very interesting and in-depth work.' He adds: 'The Citizens Assembly approach to molding AI behavior is also thought-provoking.'
What sorts of biases have you noticed in your conversations with chatbots? Share your examples and thoughts in the comments below.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Behind the Curtain: The scariest AI reality

Axios

19 minutes ago

Axios

Behind the Curtain: The scariest AI reality

The wildest, scariest, indisputable truth about AI's large language models is that the companies building them don't know exactly why or how they work. Sit with that for a moment. The most powerful companies, racing to build the most powerful superhuman intelligence capabilities — ones they readily admit occasionally go rogue to make things up, or even threaten their users — don't know why their machines do what they do. Why it matters: With the companies pouring hundreds of billions of dollars into willing superhuman intelligence into a quick existence, and Washington doing nothing to slow or police them, it seems worth dissecting this Great Unknown. None of the AI companies dispute this. They marvel at the mystery — and muse about it publicly. They're working feverishly to better understand it. They argue you don't need to fully understand a technology to tame or trust it. Two years ago, Axios managing editor for tech Scott Rosenberg wrote a story, "AI's scariest mystery," saying it's common knowledge among AI developers that they can't always explain or predict their systems' behavior. And that's more true than ever. Yet there's no sign that the government or companies or general public will demand any deeper understanding — or scrutiny — of building a technology with capabilities beyond human understanding. They're convinced the race to beat China to the most advanced LLMs warrants the risk of the Great Unknown. The House, despite knowing so little about AI, tucked language into President Trump's "Big, Beautiful Bill" that would prohibit states and localities from any AI regulations for 10 years. The Senate is considering limitations on the provision. Neither the AI companies nor Congress understands the power of AI a year from now, much less a decade from now. The big picture: Our purpose with this column isn't to be alarmist or " doomers." It's to clinically explain why the inner workings of superhuman intelligence models are a black box, even to the technology's creators. We'll also show, in their own words, how CEOs and founders of the largest AI companies all agree it's a black box. Let's start with a basic overview of how LLMs work, to better explain the Great Unknown: LLMs — including Open AI's ChatGPT, Anthropic's Claude and Google's Gemini — aren't traditional software systems following clear, human-written instructions, like Microsoft Word. In the case of Word, it does precisely what it's engineered to do. Instead, LLMs are massive neural networks — like a brain — that ingest massive amounts of information (much of the internet) to learn to generate answers. The engineers know what they're setting in motion, and what data sources they draw on. But the LLM's size — the sheer inhuman number of variables in each choice of "best next word" it makes — means even the experts can't explain exactly why it chooses to say anything in particular. We asked ChatGPT to explain this (and a human at OpenAI confirmed its accuracy): "We can observe what an LLM outputs, but the process by which it decides on a response is largely opaque. As OpenAI's researchers bluntly put it, 'we have not yet developed human-understandable explanations for why the model generates particular outputs.'" "In fact," ChatGPT continued, "OpenAI admitted that when they tweaked their model architecture in GPT-4, 'more research is needed' to understand why certain versions started hallucinating more than earlier versions — a surprising, unintended behavior even its creators couldn't fully diagnose." Anthropic — which just released Claude 4, the latest model of its LLM, with great fanfare — admitted it was unsure why Claude, when given access to fictional emails during safety testing, threatened to blackmail an engineer over a supposed extramarital affair. This was part of responsible safety testing — but Anthropic can't fully explain the irresponsible action. Again, sit with that: The company doesn't know why its machine went rogue and malicious. And, in truth, the creators don't really know how smart or independent the LLMs could grow. Anthropic even said Claude 4 is powerful enough to pose a greater risk of being used to develop nuclear or chemical weapons. OpenAI's Sam Altman and others toss around the tame word of " interpretability" to describe the challenge. "We certainly have not solved interpretability," Altman told a summit in Geneva last year. What Altman and others mean is they can't interpret the why: Why are LLMs doing what they're doing? Anthropic CEO Dario Amodei, in an essay in April called "The Urgency of Interpretability," warned: "People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work. They are right to be concerned: this lack of understanding is essentially unprecedented in the history of technology." Amodei called this a serious risk to humanity — yet his company keeps boasting of more powerful models nearing superhuman capabilities. Anthropic has been studying the interpretability issue for years, and Amodei has been vocal about warning it's important to solve. In a statement for this story, Anthropic said: "Understanding how AI works is an urgent issue to solve. It's core to deploying safe AI models and unlocking [AI's] full potential in accelerating scientific discovery and technological development. We have a dedicated research team focused on solving this issue, and they've made significant strides in moving the industry's understanding of the inner workings of AI forward. It's crucial we understand how AI works before it radically transforms our global economy and everyday lives." (Read a paper Anthropic published last year, "Mapping the Mind of a Large Language Model.") Elon Musk has warned for years that AI presents a civilizational risk. In other words, he literally thinks it could destroy humanity, and has said as much. Yet Musk is pouring billions into his own LLM called Grok. "I think AI is a significant existential threat," Musk said in Riyadh, Saudi Arabia, last fall. There's a 10%-20% chance "that it goes bad." Reality check: Apple published a paper last week, "The Illusion of Thinking," concluding that even the most advanced AI reasoning models don't really "think," and can fail when stress-tested. The study found that state-of-the-art models (OpenAI's o3-min, DeepSeek R1 and Anthropic's Claude-3.7-Sonnet) still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero "beyond certain complexities." But a new report by AI researchers, including former OpenAI employees, called " AI 2027," explains how the Great Unknown could, in theory, turn catastrophic in less than two years. The report is long and often too technical for casual readers to fully grasp. It's wholly speculative, though built on current data about how fast the models are improving. It's being widely read inside the AI companies. It captures the belief — or fear — that LLMs could one day think for themselves and start to act on their own. Our purpose isn't to alarm or sound doomy. Rather, you should know what the people building these models talk about incessantly. You can dismiss it as hype or hysteria. But researchers at all these companies worry LLMs, because we don't fully understand them, could outsmart their human creators and go rogue. In the AI 2027 report, the authors warn that competition with China will push LLMs potentially beyond human control, because no one will want to slow progress even if they see signs of acute danger. The safe-landing theory: Google's Sundar Pichai — and really all of the big AI company CEOs — argue that humans will learn to better understand how these machines work and find clever, if yet unknown ways, to control them and " improve lives." The companies all have big research and safety teams, and a huge incentive to tame the technologies if they want to ever realize their full value.

How to get the most out of Google's free AI Studio

Fast Company

23 minutes ago

Fast Company

How to get the most out of Google's free AI Studio

Google's AI Studio and Labs let you experiment for free with new AI tools. I love the way these digital sandboxes—like the one from Hugging Face —let you try out creative new uses of AI. You can dabble around then download and share what you make, without having to master a complex new platform. Read on for a few Google AI experiments to try. All are free, fast, and easy to use. 1. Transform an image Upload a photo and use Gemini's AI Studio Image Generation to transform it with prompts. Iterate on your original image until you get a version you like. The model understands natural language, so you don't have to master prompt lingo. 2. Generate an AI voice conversation AI-generated voices are increasingly hard to distinguish from human ones. If you're surprised, try Generate Speech in the AI Studio or Google's NotebookLM. How to use Generate Speech in Google's AI Studio Paste in text, either for a narration or a conversation between two people Open the settings tab to pick from 30 AI voices. Each is labeled with a characteristic—e.g. upbeat, gravelly, or mature. Click run to generate the conversation. Optionally adjust the playback speed. Download the file if you want to keep it, or paste in different text to try again. Example: a silly 90-sec chat between two violinists I scripted with Gemini and rendered quickly with this Generate Speech tool. Use case: Make a narration track for an instructional video. ElevenLabs has a better professional model for this, but AI Studio's is free, easy and quick. Alternatives Google's Gemini AI app can also now generate audio overviews from files you upload, if you're on a paid plan. Google's free NotebookLM has a new mobile app, and now lets you generate an audio conversation in any of 50 languages. Unlike Generate Speech in AI Studio, NotebookLM audio overviews summarize your material, they don't perform words as written. Why NotebookLM is so useful. Google's Illuminate lets you generate, listen to, share, and download AI conversations about research papers and famous books. Here's an audio chat about David Copperfield, for example. A bit dry to listen to, but still useful. 3. Make a gif Alternative: You can also make a static image with Google's Imagen 3 or the new Imagen 4. Write a short prompt and select your preferred aspect ratio. So far I still prefer Ideogram (why I like it) and ChatGPT's new image engine. 4. Generate a short video Google's Veo 2 and Flow let you generate free short video clips almost instantly with a prompt. Create a clip to add vibrancy or humor to a presentation, or a visual metaphor to help you explain something. Here are 25 other quick ideas for how you might use little AI-generated video scenes. How to create a video clip with Veo 2 Pick a length (5 to 8 seconds) and select horizontal or vertical orientation Write a prompt & optionally upload a photo to suggest a visual direction Example: Take a look at a parakeet photo I started with and the 5-second video I generated from the photo with Veo 2. Tip: Convert short video clips into gifs for free with Ezgif or Giphy. Unlike video files, gifs are easy to share and auto-play in an email or presentation. What's next: Remarkably lifelike clips made with Google's newer Veo 3 model went viral this week. These AI-generated visuals—with sound—are only available on the $250/month(!) plan for now, so try Veo 2 for free. 5. Explain things with lots of tiny cats This playful mini app creates short, step-by-step visual guides using charming cat illustrations to explain any concept, from how a violin works to the concept behind the matrix.

The way you program an AI is like the way you program a person, says Nvidia's Huang

CNBC

29 minutes ago

CNBC

The way you program an AI is like the way you program a person, says Nvidia's Huang

Nvidia CEO Jensen Huang says artificial intelligence is the "great equalizer" because it lets anyone program using everyday language. Speaking at London Tech Week on Monday, Huang said that, historically, computing was hard and not available to everyone. "We had to learn programming languages. We had to architect it. We had to design these computers that are very complicated," he said on stage alongside U.K. Prime Minister Kier Starmer. "Now, all of a sudden ... there's a new programming language. This new programming language is called 'human.'" Conversational AI models were thrown into the spotlight in 2022 when OpenAI's ChatGPT exploded onto the scene. In February, the San Francisco-based tech company said it had 400 million weekly active users. Users can ask chatbots, such as ChatGPT, Google's Gemini or Microsoft's Copilot, questions and they respond in a conversational way that feels more like talking to another human than an AI system. CEO Huang, whose company engineers some of the world's most advanced semiconductors and AI chips, highlighted that this technology can now be used in programming. He highlighted that very few people know how to use programming languages like C++ or Python, but "everybody ... knows 'human'." "The way you program a computer today, to ask the computer to do something for you, even write a program, generate images, write a poem — just ask it nicely," he said. "And the thing that's really, really quite amazing is the way you program an AI is like the way you program a person." He gave the example of simply asking a computer to write a poem to describe the keynote speech at the London Tech Week event. "You say: You are an incredible poet ... And I would like you to write a poem to describe today's keynote. And without very much effort, this AI would help you generate such a wonderful poem," he said. "And when it answers ... you could say: I feel like you could do even better. And it would go off and think about it, and it'll come back and say, in fact, I I can do better, and it does do a better job." Huang's comments come as a growing number of companies — such as Shopify, Duolingo and Fiverr — encourage their employees to incorporate AI into their work. Indeed, last week OpenAI announced that it has 3 million paying business users. Huang regularly touts AI's ability to help workers do their jobs more efficiently and has encouraged workers to embrace the technology as they look to make themselves valuable employees — especially given the horror stories around AI's potential to replace jobs. "This way of interacting with computers, I think, is something that almost anybody can do, and I would just encourage everybody to engage it," Huang added on Monday. "Children are already doing that themselves naturally, and this is going to be transformative.