Latest news with #breakthroughs


Forbes
2 days ago
- Science
- Forbes
Will Reinforcement Learning Take Us To AGI?
There aren't many truly new ideas in artificial intelligence. More often, breakthroughs in AI happen when concepts that have existed for years suddenly take on new power because underlying technology inputs—in particular, raw computing power—finally catch up to unlock those concepts' full potential. Famously, Geoff Hinton and a small group of collaborators devoted themselves tirelessly to neural networks starting in the early 1970s. For decades, the technology didn't really work and the outside world paid little attention. It was not until the early 2010s—thanks to the arrival of sufficiently powerful Nvidia GPUs and internet-scale training data—that the potential of neural networks was finally unleashed for all to see. In 2024, more than half a century after he began working on neural networks, Hinton was awarded the Nobel Prize for pioneering the field of modern AI. Reinforcement learning has followed a similar arc. Richard Sutton and Andrew Barto, the fathers of modern reinforcement learning, laid down the foundations of the field starting in the 1970s. Even before Sutton and Barto began their work, the basic principles underlying reinforcement learning—in short, learning by trial and error based on positive and negative feedback—had been developed by behavioral psychologists and animal researchers going back to the early twentieth century. Yet in just the past year, advances in reinforcement learning (RL) have taken on newfound importance and urgency in the world of AI. It has become increasingly clear that the next leap in AI capabilities will be driven by RL. If artificial general intelligence (AGI) is in fact around the corner, reinforcement learning will play a central role in getting us there. Just a few years ago, when ChatGPT's launch ushered in the era of generative AI, almost no one would have predicted this. Deep questions remain unanswered about reinforcement learning's capabilities and its limits. No field in AI is moving more quickly today than RL. It has never been more important to understand this technology, its history and its Learning 101 The basic principles of reinforcement learning have remained consistent since Sutton and Barto established the field in the 1970s. The essence of RL is to learn by interacting with the world and seeing what happens. It is a universal and foundational form of learning; every human and animal does it. In the context of artificial intelligence, a reinforcement learning system consists of an agent interacting with an environment. RL agents are not given direct instructions or answers by humans; instead, they learn through trial and error. When an agent takes an action in an environment, it receives a reward signal from the environment, indicating that the action produced either a positive or a negative outcome. The agent's goal is to adjust its behavior to maximize positive rewards and minimize negative rewards over time. How does the agent decide which actions to take? Every agent acts according to a policy, which can be understood as the formula or calculus that determines the agent's action based on the particular state of the environment. A policy can be a simple set of rules, or even pure randomness, or it can be represented by a far more complex system, like a deep neural network. One final concept that is important to understand in RL, closely related to the reward signal, is the value function. The value function is the agent's estimate of how favorable a given state of the environment will be (that is, how many positive and negative rewards it will lead to) over the long run. Whereas reward signals are immediate pieces of feedback that come from the environment based on current conditions, the value function is the agent's own learned estimate of how things will play out in the long term. The entire purpose of value functions is to estimate reward signals, but unlike reward signals, value functions enable agents to reason and plan over longer time horizons. For instance, value functions can incentivize actions even when they lead to negative near-term rewards because the long-term benefit is estimated to be worth it. When RL agents learn, they do so in one of three ways: by updating their policy, updating their value function, or updating both together. A brief example will help make these concepts concrete. Imagine applying reinforcement learning to the game of chess. In this case, the agent is an AI chess player. The environment is the chess board, with any given configuration of chess pieces representing a state of that environment. The agent's policy is the function (whether a simple set of rules, or a decision tree, or a neural network, or something else) that determines which move to make based on the current board state. The reward signal is simple: positive when the agent wins a game, negative when it loses a game. The agent's value function is its learned estimate of how favorable or unfavorable any given board position is—that is, how likely the position is to lead to a win or a loss. As the agent plays more games, strategies that lead to wins will be positively reinforced and strategies that lead to losses will be negatively reinforced via updates to the agent's policy and value function. Gradually, the AI system will become a stronger chess player. In the twenty-first century, one organization has championed and advanced the field of reinforcement learning more than any other: DeepMind. Founded in 2010 as a startup devoted to solving artificial intelligence and then acquired by Google in 2014 for ~$600 million, London-based DeepMind made a big early bet on reinforcement learning as the most promising path forward in AI. And the bet paid off. The second half of the 2010s were triumphant years for the field of reinforcement learning. In 2016, DeepMind's AlphaGo became the first AI system to defeat a human world champion at the ancient Chinese game of Go, a feat that many AI experts had believed was impossible. In 2017, DeepMind debuted AlphaZero, which taught itself Go, chess and Japanese chess entirely via self-play and bested every other AI and human competitor in those games. And in 2019, DeepMind unveiled AlphaStar, which mastered the video game StarCraft—an even more complex environment than Go given the vast action space, imperfect information, numerous agents and real-time gameplay. AlphaGo, AlphaZero, AlphaStar—reinforcement learning powered each of these landmark achievements. As the 2010s drew to a close, RL seemed poised to dominate the coming generation of artificial intelligence breakthroughs, with DeepMind leading the way. But that's not what happened. Right around that time, a new AI paradigm unexpectedly burst into the spotlight: self-supervised learning for autoregressive language models. In 2019, a small nonprofit research lab named OpenAI released a model named GPT-2 that demonstrated surprisingly powerful general-purpose language capabilities. The following summer, OpenAI debuted GPT-3, whose astonishing abilities represented a massive leap in performance from GPT-2 and took the AI world by storm. In 2022 came ChatGPT. In short order, every AI organization in the world reoriented its research focus to prioritize large language models and generative AI. These large language models (LLMs) were based on the transformer architecture and made possible by a strategy of aggressive scaling. They were trained on unlabeled datasets that were bigger than any previous AI training data corpus—essentially the entire internet—and were scaled up to unprecedented model sizes. (GPT-2 was considered mind-bogglingly large at 1.5 billion parameters; one year later, GPT-3 debuted at 175 billion parameters.) Reinforcement learning fell out of fashion for half a decade. A widely repeated narrative during the early 2020s was that DeepMind had seriously misread technology trends by committing itself to reinforcement learning and missing the boat on generative AI. Yet today, reinforcement learning has reemerged as the hottest field within AI. What happened? In short, AI researchers discovered that applying reinforcement learning to generative AI models was a killer combination. Starting with a base LLM and then applying reinforcement learning on top of it meant that, for the first time, RL could natively operate with the gift of language and broad knowledge about the world. Pretrained foundation models represented a powerful base on which RL could work its magic. The results have been dazzling—and we are just getting Meets LLMs What does it mean, exactly, to combine reinforcement learning with large language models? A key insight to start with is that the core concepts of RL can be mapped directly and elegantly to the world of LLMs. In this mapping, the LLM itself is the agent. The environment is the full digital context in which the LLM is operating, including the prompts it is presented with, its context window, and any tools and external information it has access to. The model's weights represent the policy: they determine how the agent acts when presented with any particular state of the environment. Acting, in this context, means generating tokens. What about the reward signal and the value function? Defining a reward signal for LLMs is where things get interesting and complicated. It is this topic, more than any other, that will determine how far RL can take us on the path to superintelligence. The first major application of RL to LLMs was reinforcement learning from human feedback, or RLHF. The frontiers of AI research have since advanced to more cutting-edge methods of combining RL and LLMs, but RLHF represents an important step on the journey, and it provides a concrete illustration of the concept of reward signals for LLMs. RLHF was invented by DeepMind and OpenAI researchers back in 2017. (As a side note, given today's competitive and closed research environment, it is remarkable to remember that OpenAI and DeepMind used to conduct and publish foundational research together.) RLHF's true coming-out party, though, was ChatGPT. When ChatGPT debuted in November 2022, the underlying AI model on which it was based was not new; it had already been publicly available for many months. The reason that ChatGPT became an overnight success was that it was approachable, easy to talk to, helpful, good at following directions. The technology that made this possible was RLHF. In a nutshell, RLHF is a method to adapt LLMs' style and tone to be consistent with human-expressed preferences, whatever those preferences may be. RLHF is most often used to make LLMs 'helpful, harmless and honest,' but it can equally be used to make them more flirtatious, or rude, or sarcastic, or progressive, or conservative. How does RLHF work? The key ingredient in RLHF is 'preference data' generated by human subjects. Specifically, humans are asked to consider two responses from the model for a given prompt and to select which one of the two responses they prefer. This pairwise preference data is used to train a separate model, known as the reward model, which learns to produce a numerical rating of how desirable or undesirable any given output from the main model is. This is where RL comes in. Now that we have a reward signal, an RL algorithm can be used to fine-tune the main model—in other words, the RL agent—so that it generates responses that maximize the reward model's scores. In this way, the main model comes to incorporate the style and values reflected in the human-generated preference data. Circling back to reward signals and LLMs: in the case of RLHF, as we have seen, the reward signal comes directly from humans and human-generated preference data, which is then distilled into a reward model. What if we want to use RL to give LLMs powerful new capabilities beyond simply adhering to human preferences?The Next Frontier The most important development in AI over the past year has been language models' improved ability to engage in reasoning. What exactly does it mean for an AI model to 'reason'? Unlike first-generation LLMs, which respond to prompts using next-token prediction with no planning or reflection, reasoning models spend time thinking before producing a response. These models think by generating 'chains of thought,' enabling them to systematically break down a given task into smaller steps and then work through each step in order to arrive at a well-thought-through answer. They also know how and when to use external tools—like a calculator, a code interpreter or the internet—to help solve problems. The world's first reasoning model, OpenAI's o1, debuted less than a year ago. A few months later, China-based DeepSeek captured world headlines when it released its own reasoning model, R1, that was near parity with o1, fully open and trained using far less compute. The secret sauce that gives AI models the ability to reason is reinforcement learning—specifically, an approach to RL known as reinforcement learning from verifiable rewards (RLVR). Like RLHF, RLVR entails taking a base model and fine-tuning it using RL. But the source of the reward signal, and therefore the types of new capabilities that the AI gains, are quite different. As its name suggests, RLVR improves AI models by training them on problems whose answers can be objectively verified—most commonly, math or coding tasks. First, a model is presented with such a task—say, a challenging math problem—and prompted to generate a chain of thought in order to solve the problem. The final answer that the model produces is then formally determined to be either correct or incorrect. (If it's a math question, the final answer can be run through a calculator or a more complex symbolic math engine; if it's a coding task, the model's code can be executed in a sandboxed environment.) Because we now have a reward signal—positive if the final answer is correct, negative if it is incorrect—RL can be used to positively reinforce the types of chains of thought that lead to correct answers and to discourage those that lead to incorrect answers. The end result is a model that is far more effective at reasoning: that is, at accurately working through complex multi-step problems and landing on the correct solution. This new generation of reasoning models has demonstrated astonishing capabilities in math competitions like the International Math Olympiad and on logical tests like the ARC-AGI benchmark. So—is AGI right around the corner? Not necessarily. A few big-picture questions about reinforcement learning and language models remain unanswered and loom large. These questions inspire lively debate and widely varying opinions in the world of artificial intelligence today. Their answers will determine how powerful AI gets in the coming months.A Few Big Unanswered Questions Today's cutting-edge RL methods rely on problems whose answers can be objectively verified as either right or wrong. Unsurprisingly, then, RL has proven exceptional at producing AI systems that are world-class at math, coding, logic puzzles and standardized tests. But what about the many problems in the world that don't have easily verifiable answers? In a provocative essay titled 'The Problem With Reasoners', Aidan McLaughlin elegantly articulates this point: 'Remember that reasoning models use RL, RL works best in domains with clear/frequent reward, and most domains lack clear/frequent reward.' McLaughlin argues that most domains that humans actually care about are not easily verifiable, and we will therefore have little success using RL to make AI superhuman at them: for instance, giving career advice, managing a team, understanding social trends, writing original poetry, investing in startups. A few counterarguments to this critique are worth considering. The first centers on the concepts of transfer learning and generalizability. Transfer learning is the idea that models trained in one area can transfer those learnings to improve in other areas. Proponents of transfer learning in RL argue that, even if reasoning models are trained only on math and coding problems, this will endow them with broad-based reasoning skills that will generalize beyond those domains and enhance their ability to tackle all sorts of cognitive tasks. 'Learning to think in a structured way, breaking topics down into smaller subtopics, understanding cause and effect, tracing the connections between different ideas—these skills should be broadly helpful across problem spaces,' said Dhruv Batra, cofounder/chief scientist at Yutori and former senior AI researcher at Meta. 'This is not so different from how we approach education for humans: we teach kids basic numeracy and literacy in the hopes of creating a generally well-informed and well-reasoning population.' Put more strongly: if you can solve math, you can solve anything. Anything that can be done with a computer, after all, ultimately boils down to math. It is an intriguing hypothesis. But to date, there is no conclusive evidence that RL endows LLMs with reasoning capabilities that generalize beyond easily verifiable domains like math and coding. It is no coincidence that the most important advances in AI in recent months—both from a research and a commercial perspective—have occurred in precisely these two fields. If RL can only give AI models superhuman powers in domains that can be easily verified, this represents a serious limit to how far RL can advance the frontiers of AI's capabilities. AI systems that can write code or do mathematics as well as or better than humans are undoubtedly valuable. But true general-purpose intelligence consists of much more than this. Let us consider another counterpoint on this topic, though: what if verification systems can in fact be built for many (or even all) domains, even when those domains are not as clearly deterministic and checkable as a math problem? Might it be possible to develop a verification system that can reliably determine whether a novel, or a government policy, or a piece of career advice, is 'good' or 'successful' and therefore should be positively reinforced? This line of thinking quickly leads us into borderline philosophical considerations. In many fields, determining the 'goodness' or 'badness' of a given outcome would seem to involve value judgments that are irreducibly subjective, whether on ethical or aesthetic grounds. For instance, is it possible to determine that one public policy outcome (say, reducing the federal deficit) is objectively superior to another (say, expanding a certain social welfare program)? Is it possible to objectively identify that a painting or a poem is or is not 'good'? What makes art 'good'? Is beauty not, after all, in the eye of the beholder? Certain domains simply do not possess a 'ground truth' to learn from, but rather only differing values and tradeoffs to be weighed. Even in such domains, though, another possible approach exists. What if we could train an AI via many examples to instinctively identify 'good' and 'bad' outcomes, even if we can't formally define them, and then have that AI serve as our verifier? As Julien Launay, CEO/cofounder of RL startup Adaptive ML, put it: 'In bridging the gap from verifiable to non-verifiable domains, we are essentially looking for a compiler for natural language…but we already have built this compiler: that's what large language models are.' This approach is often referred to as reinforcement learning from AI feedback (RLAIF) or 'LLM-as-a-Judge.' Some researchers believe it is the key to making verification possible across more domains. But it is not clear how far LLM-as-a-Judge can take us. The reason that reinforcement learning from verifiable rewards has led to such incisive reasoning capabilities in LLMs in the first place is that it relies on formal verification methods: correct and incorrect answers exist to be discovered and learned. LLM-as-a-Judge seems to bring us back to a regime more closely resembling RLHF, whereby AI models can be fine-tuned to internalize whatever preferences and value judgments are contained in the training data, arbitrary though they may be. This merely punts the problem of verifying subjective domains to the training data, where it may remain as unsolvable as ever. We can say this much for sure: to date, neither OpenAI nor Anthropic nor any other frontier lab has debuted an RL-based system with superhuman capabilities in writing novels, or advising governments, or starting companies, or any other activity that lacks obvious verifiability. This doesn't mean that the frontier labs are not making progress on the problem. Indeed, just last month, leading OpenAI researcher Noam Brown shared on X: 'We developed new techniques that make LLMs a lot better at hard-to-verify tasks.' Rumors have even begun to circulate that OpenAI has developed a so-called 'universal verifier,' which can provide an accurate reward signal in any domain. It is hard to imagine how such a universal verifier would work; no concrete details have been shared publicly. Time will tell how powerful these new techniques is important to remember that we are still in the earliest innings of the reinforcement learning era in generative AI. We have just begun to scale RL. The total amount of compute and training data devoted to reinforcement learning remains modest compared to the level of resources spent on pretraining foundation models. This chart from a recent OpenAI presentation speaks volumes: At this very moment, AI organizations are preparing to deploy vast sums to scale up their reinforcement learning efforts as quickly as they can. As the chart above depicts, RL is about to transition from a relatively minor component of AI training budgets to the main focus. What does it mean to scale RL? 'Perhaps the most important ingredient when scaling RL is the environments—in other words, the settings in which you unleash the AI to explore and learn,' said Stanford AI researcher Andy Zhang. 'In addition to sheer quantity of environments, we need higher-quality environments, especially as model capabilities improve. This will require thoughtful design and implementation of environments to ensure diversity and goldilocks difficulty and to avoid reward hacking and broken tasks.' When xAI debuted its new frontier model Grok 4 last month, it announced that it had devoted 'over an order of magnitude more compute' to reinforcement learning than it had with previous models. We have many more orders of magnitude to go. Today's RL-powered models, while powerful, face shortcomings. The unsolved challenge of difficult-to-verify domains, discussed above, is one. Another critique is known as elicitation: the hypothesis that reinforcement learning doesn't actually endow AI models with greater intelligence but rather just elicits capabilities that the base model already possessed. Yet another obstacle that RL faces is its inherent sample inefficiency compared to other AI paradigms: RL agents must do a tremendous amount of work to receive a single bit of feedback. This 'reward sparsity' has made RL impracticable to deploy in many contexts. It is possible that scale will be a tidal wave that washes all of these concerns away. If there is one principle that has defined frontier AI in recent years, after all, it is this: nothing matters more than scale. When OpenAI scaled from GPT-2 to GPT-3 to GPT-4 between 2019 and 2023, the models' performance gains and emergent capabilities were astonishing, far exceeding the community's expectations. At every step, skeptics identified shortcomings and failure modes with these models, claiming that they revealed fundamental weaknesses in the technology paradigm and predicting that progress would soon hit a wall. Instead, the next generation of models would blow past these shortcomings, advancing the frontier by leaps and bounds and demonstrating new capabilities that critics had previously argued were impossible. The world's leading AI players are betting that a similar pattern will play out with reinforcement learning. If recent history is any guide, it is a good bet to make. But it is important to remember that AI 'scaling laws'—which predict that AI performance increases as data, compute and model size increase—are not actually laws in any sense of that word. They are empirical observations that for a time proved reliable and predictive for pretraining language models and that have been preliminarily demonstrated in other data modalities. There is no formal guarantee that scaling laws will always hold in AI, nor how long they will last, nor how steep their slope will be. The truth is that no one knows for sure what will happen when we massively scale up RL. But we are all about to find tuned for our follow-up article on this topic—or feel free to reach out directly to discuss!Looking Forward Reinforcement learning represents a compelling approach to building machine intelligence for one profound reason: it is not bound by human competence or imagination. Training an AI model on vast troves of labeled data (supervised learning) will make the model exceptional at understanding those labels, but its knowledge will be limited to the annotated data that humans have prepared. Training an AI model on the entire internet (self-supervised learning) will make the model exceptional at understanding the totality of humanity's existing knowledge, but it is not clear that this will enable it to generate novel insights that go beyond what humans have already put forth. Reinforcement learning faces no such ceiling. It does not take its cues from existing human data. An RL agent learns for itself, from first principles, through first-hand experience. AlphaGo's 'Move 37' serves as the archetypal example here. In one of its matches against human world champion Lee Sedol, AlphaGo played a move that violated thousands of years of accumulated human wisdom about Go strategy. Most observers assumed it was a miscue. Instead, Move 37 proved to be a brilliant play that gave AlphaGo a decisive advantage over Sedol. The move taught humanity something new about the game of Go. It has forever changed the way that human experts play the game. The ultimate promise of artificial intelligence is not simply to replicate human intelligence. Rather, it is to unlock new forms of intelligence that are radically different from our own—forms of intelligence that can come up with ideas that we never would have come up with, make discoveries that we never would have made, help us see the world in previously unimaginable ways. We have yet to see a 'Move 37 moment' in the world of generative AI. It may be a matter of weeks or months—or it may never happen. Watch this space.


Telegraph
4 days ago
- Health
- Telegraph
The race to find a party-friendly replacement for alcohol
There are mad inventions, such as weight-loss drugs or the contraceptive Pill, that come out of the blue and change the world forever. Then there are the much heralded 'breakthroughs' that never come to much. Along with flying cars and lab-grown steaks, the lifelong dream of Prof David Nutt has always been the stuff of science fantasy. We're here in the Commonwealth Building of Imperial College, London, a tall and unlovely structure in the outskirts of the city, because Prof Nutt believes that is about to change. When Prof Nutt was a medical student at Downing College, Cambridge, in 1969, doctors believed the odd drink was good for you. Now alcohol consumption is blamed for diabetes, heart disease, depression and dementia alike. That consensus has emerged gradually over the 50 years Prof Nutt has been working to rid the world of alcohol, by replacing the ethanol in our drinks with another, more well-behaved molecule. Down the road is Wormwood Scrubs prison where, according to Government estimates, two thirds of inmates will be alcohol-dependent. Next door is Hammersmith Hospital. 'About three people a week in Britain die of alcohol poisoning,' Prof Nutt points out. 'They just drink so much that they stop breathing.' It's a serious matter, but Prof Nutt is friendly and optimistic, with a jolly West Country accent, and the practised stage manner of a celebrity academic. He is the world's most famous psychopharmacologist – 'That means I study the effects of drugs on the brain,' he explains – and at 74, he is still ambitious and determined. As a graduate student in the 1980s, he thought he was in line for a Nobel Prize when he found a chemical 'antidote' to alcohol that could sober up a rat. That didn't come to pass because, as his supervisor pointed out, 'blocking the brain effects wasn't going to stop the toxicity'. Instead he made his name as an expert in psychedelic drugs, namely LSD and psilocybin 'magic' mushrooms, and in the early 2000s he became an adviser to the government on drug policy. Prof Nutt was very publicly forced to resign from that post in 2009 after claiming in a research paper that alcohol was far more dangerous than both LSD and the clubbers' drug MDMA. It would have ended any ordinary scientist's career, but the fiasco drew the attention of investors, who were keen to bring his solution for the world's drinking problem to market. Months later, The Telegraph reported Prof Nutt's renewed efforts to create a 'synthetic alcohol' that was non-addictive and non-toxic. It took a decade, but then it was announced: his team had successfully produced a safe but otherwise identical stand-in for alcohol. Prof Nutt had tried it out on himself. He called it Alcarelle, and it would be on shelves by 2026. That is not going to happen, Prof Nutt admits – though 'maybe we will be able to [sell it] by 2027'. He is currently preparing to take Alcarelle through food safety testing in the US, in the hopes that it might then be rolled out in Britain, too, and across the world. That process, including safety testing in human trials, will cost at least £3m. 'It's such an important thing to do,' he says. 'It would be crazy if people didn't invest, but you certainly can't count your chickens.' Proof of concept If Prof Nutt succeeds in bringing Alcarelle to market, will anyone want to drink it? Alcohol is a special sort of drug, not only because it is legal in most of the world. We've been chasing its high, despite the harm and addiction it causes us, for 10 million years at least. Our ape ancestors sought out rotted fruit to booze on long before they made fire. It has a pull on us because it is the perfect social crutch, says Prof Nutt – we human beings 'desperately need to socialise, but often need help doing it'. Alcarelle is designed to give a happy, two-drinks kind of tipsiness. 'I'm not trying to get anyone drunk,' he says. 'Most people, they drink in social situations, and they just want to relax.' Bans on alcohol have never worked. The 1920s roared in America despite prohibition laws, and in much of the Muslim world, alcohol consumption has always been relatively common 'despite Islam's best efforts,' says Prof Rudi Matthee, a historian who studies drugs as forces in history. But in the West, alcohol consumption 'is now going down dramatically'. It's a pattern driven by young people: a third of under-25s in Britain now don't drink at all, some surveys suggest. For the past five years, Prof Nutt has also worked on a 'proof of concept' product aimed at people who want to drink less. Sentia, his range of three mock-spirits, is made with a blend of exotically named plants – including magnolia bark, passionflower and ashwagandha – to give drinkers a relaxed, happy buzz. It works by 'stimulating the GABA receptors in your brain to mimic the pleasant effects of alcohol,' Prof Nutt says – GABA being a neurotransmitter, like serotonin or dopamine, that disinhibits behaviour. He has sold more than 300,000 bottles since Sentia's official launch in 2021. Alcohol works on this same GABA system. Prof Nutt proved that when he sobered up his lab rats, achieved by 'blocking' that system's effects. But whereas alcohol is a 'complicated' drug that impacts GABA throughout the whole brain as well as many other systems, Sentia instead works on specific 'targets', and is therefore hangover-free and non-addictive, he says. Alcarelle will work in the same way but is 'much stronger', and would be licensed to existing drinks companies as an additive. Three major drinks companies have already approached Nutt because they 'want access to our technology', he claims. 'We're not giving it to them yet, because we're slightly concerned that they might just buy it to destroy it. But I think it's almost past that now. It would be self-defeating to destroy something that could keep them profitable.' Prof Nutt and his team initially experimented with benzodiazepines, a class of strong anti-anxiety drugs that are known to work on the brain's GABA system. In recent years their tactics have changed, and the molecule set to be tested in the US is a different sort of compound, the specifics of which are, as yet, undisclosed. How it makes consumers feel may not actually matter. While drinking is in decline, young people are happily replacing alcohol with other drugs of all kinds. British charities and rehab clinics report that the use of ketamine by under-30s has risen, along with benzodiazepines such as Xanax and Valium. Nearly 3 per cent of people under 25 admitted to using psychedelic drugs last year. The use of mephedrone, a newer drug that has been compared to both cocaine and MDMA, appears to be having a second boom in clubs and bars in London, despite its ban in 2010. Mephedrone emerged from Israel in the early 2000s. It was sold legally at first, by an enterprising chemist called Ezekiel Golan, 55, who, after a stint on the Human Genome Project, was working as head of bioinformatics for a leading plant genetics company. He was studying the molecule octopamine, a neurotransmitter in the brains of smart invertebrates such as octopuses, when he noticed a 'funny-looking' cluster in the bottom of its chemical diagram. 'I wondered what that felt like,' says Dr Golan. After creating it in his laboratory and road-testing it on himself, he released mephedrone to the world. A 'just right' feeling of mild drunkenness Dr Golan bills himself as the world's foremost inventor of new legal recreational drugs. Where Prof Nutt has sought the Nobel prize, Dr Golan had 'always envied Albert Hofman', the accidental inventor of LSD. 'I've made things that take you through Aldous Huxley's doors of perception, up the stairs and all the way to Australia,' he boasts. In 2005, he made 'a molecule that was so extremely bland that I put it in my drawer and forgot about it'. Then, in 2009, Dr Golan heard about Prof Nutt's sacking as a government advisor, and his plan to 'invent his way around the alcohol problem'. Dr Golan realised then that he had 'something that would fit the bill.' For several years, Prof Nutt and Dr Golan worked together in 'the war against alcohol', as they put it. In 2017, they co-wrote a research paper on the effects and safety of Dr Golan's new compound, called MEAI. Rather than influencing the brain's GABA system, MEAI worked on its serotonin receptors, they found. There was also some more (legal) hands-on testing. Members of Prof Nutt's department at Imperial College were invited to 'Friday afternoon MEAI' sessions over the course of several months, a sort of end-of-week party, Dr Golan says. Prof Nutt insists that there was only ever a single 'sampling session'. Either way, enough people had tried MEAI for Dr Golan to be sure that it reliably gives a 'just right' feeling, a wash of happy, mild drunkenness. It is 'incredibly one-directional,' he says, being free of trippy visions and spiritual breakthroughs. Crucially, users do not feel inclined to binge on it. At very high doses, MEAI feels 'like the scene in Harry Potter where he actually meets Voldemort – totally white, quiet and serene'. In 2014, Dr Golan patented the compound, and he planned to give it to Prof Nutt to develop as a binge drinking mitigator. But the two had a falling out. 'David's advisers didn't get along with my advisers from a business point of view,' Dr Golan says. It was 'neither my fault nor his'. Dr Golan's 'advisers' were the predecessors of a novel medicines company called Clearmind, incorporated in Canada and led by Dr Adi Zuloff-Shani, an Israeli businesswoman. Dr Golan sold the patent family for MEAI to Clearmind in 2021, in return for shares, which he expects will one day be extremely valuable. In December, Clearmind signed a deal with a separate biotechnology company to commercially produce MEAI. Like Prof Nutt, Dr Zuloff-Shani expects her alcohol replacement to be on shelves within the next two years. Clearmind is currently going through a regulatory approval process itself, in an undisclosed country that is 'not in North America'. MEAI has been licensed for testing in human trials in Israel and the US, putting it a step ahead of Prof Nutt's Alcarelle. So far the safety profile of MEAI is 'very good', Dr Zuloff-Shani says. There is real-world evidence for its feasibility as a replacement for alcohol, too: it has already been sold in Canada. Tens of thousands of Canadians bought $10 bottles of Pace, billed as an alcohol alternative as well as an alcohol enhancer (they can be drunk together) and a binge drinking mitigator, before it was ruled against at the end of 2018. Canada's health authority released warnings not to use Pace on the grounds that it was an unregulated substance, but they 'never found a case where MEAI had caused someone physical harm,' Dr Zuloff-Shani says. Health Canada confirmed this in an email, though it also pointed out that MEAI is 'similar in structure to amphetamines' such as MDMA (the inventor of which, incidentally, described it as a 'low-calorie martini' on first try). Prof Nutt cut ties with Dr Golan in 2019, 'for reasons we do not wish to go into,' he says. His erstwhile partner in the mission to end drinking has since had 'no visibility of our molecule development.' Dr Golan maintains a 'great respect' for Prof Nutt, but he does not believe that the professor has an 'active compound' ready to sell as Alcarelle. 'I'm a discovery scientist. All credit goes to David for his pioneering vision, but he does not have that kind of methodology in mind.' Is the future synthetic or botanical? After I spoke to Prof Nutt about his dream, I was left thinking it would be a personal disaster for him if someone else arrived at it first. But he has seen this coming. While he was in their good books, Prof Nutt authored a paper for the Labour government called Drug Futures 2025. Scientists could already 'produce a recreational substance with similar effects to alcohol but fewer harms', he wrote in 2005. Within the next two decades, rogue chemists would also create 'new synthetic recreational substances', or legal highs. Some of them might feel like alcohol, Prof Nutt predicted, and they could even stand in as safer alternatives. I couldn't try Alcarelle for myself because, under the Psychoactive Substances Act of 2016, new recreational drugs – what were once legal highs – are illegal to produce, sell or supply. Alcohol itself is only exempt 'on the grounds of precedence', says Prof Nutt. The law was brought in to stop the proliferation of drugs like mephedrone, and it had the side-effect of knocking on the head Nutt's road-testing with willing participants at Imperial Colleg. International conventions on medical research mean that it remains legal for him to experiment on himself – a practice he keeps up to this day, at his lab in Hemel Hempstead. MEAI can't be tried either, outside of clinical trials, as it is 'not yet approved as a medicine or product in any jurisdiction', says Yael Stav, the head of programme management at Clearmind (and Dr Golan's wife, as it happens). The Home Office has no plans to assess whether Alcarelle or MEAI should be made exempt from the Psychoactive Substances Act, or indeed be covered by it in the first place, a source said. Such legal knots are why there is 'a debate playing out over whether the future of alcohol is synthetic or botanical', says James Jacoby, the co-founder of Sentia, Prof Nutt's plant-based spirit brand. Jacoby's own foray into devising a replacement for alcohol began with an ayahuasca trip in 2002, after which he developed 'a very intense sensitivity to nature', he says. Over 20 years he 'honed that sensitivity and applied it' to the creation of non-alcoholic drinks with an edge. Prof Nutt approached Jacoby and his wife, Vanessa, to collaborate in 2019. The couple's drink blends were then sold as Sentia, Jacoby claims (which Prof Nutt refutes), while Prof Nutt provided his 'worldwide celebrity status'. Raising enough funds to bring Alcarelle to market abroad, and then to try to get it approved here in Britain, 'was the explicit end goal', says Jacoby. Jacoby lost hope in that prospect. He left Sentia and has since set up a new venture, called ON Beer. His new lagers contain ginseng, a berry with energising properties, along with a lot of pepper and liquorice, and a handful of other plants. You can taste them, but that's why 'we're the only company with a clinical trial to support the fact that we provide an alcohol-like effect', Jacoby says. NuWave, meanwhile, an alcohol-free beer made by a former consultant for Sentia called Brendan Williams, has won international awards for its taste but has no proven ability to make you tipsy. (Sentia launched a beer alternative, called GABYR, this summer.) Alcohol with guard rails Jacoby and Prof Nutt face stiff competition from other modern-day apothecarists. In May, the country's first high-end alcohol-free bar opened in London, at a wellness club called Shoreditch &Soul. It's run by 33-year-old Karma Campbell, who, like Jacoby, has created plant blends with a punch through personal experimentation. Campbell grew up as part of the new-age traveller community 'where people use legal plants for their different effects as a part of everyday life', he says. He studied western herbal medicine at the University of Westminster, during which time he began trial-running his own concoctions at festivals. His big break came at Glastonbury in 2024, after the Daily Mail spotted him selling 'vegan cocaine' (a liquid mix including liquorice, damiana, ashwagandha and chilli) to revellers. 'I woke up to hundreds of orders, phone calls and texts,' Campbell says. Its official name is Turbo Tonic, and it's 'meant to get you buzzing, helping you stay up all night', he explains. Campbell's London bar is stocked with Turbo Tonic and three other tinctures, which are mixed into mocktails along with convincing non-alcoholic versions of any spirit or aperitif you could want: vodka, gin and rum, but also mescal, chartreuse and Campari. Customers 'do a lot of socialising', Campbell says. Observers would think they were tipsy. At the bar's launch night, I saw Cambell squirting a syringe of Turbo Tonic into a woman's mouth. I tried all of Campbell's tonics on the night, too, and while they certainly did something – the other guests agreed – I wouldn't say I was drunk. Turbo Tonic, Sentia's spirits, ON Beer and the dozen other products I've tried all have much the same effect, in that they give a hint of familiar fuzzy drunkenness without ever introducing you to the real thing. Alcohol can make you feel happy, sad or angry by turns; it gives plausible deniability for bad behaviour and secrets shared, and a headache to bond over in the morning. These inventors want to give us alcohol with guard rails, where 'every night is a geek's night out that ends at 11pm', as Dr Golan puts it. In that way, the future of drinking is already here. Dr Michael Mascha, an Austrian-born food anthropologist who made his fortune in the dot-com boom, has something for those who are done chasing alcohol's effects entirely. He has spent the past 20 years trying to make water 'a legitimate substitute for alcohol' in high-end restaurants. After becoming rich, Dr Mascha retired early from his post as a university professor, and then spent much of his time in California's finest eateries. He never drank heavily, but when he developed a heart condition in his 50s, his doctor informed him that he could 'either give up wine or give up living'. Dining out became a humiliation. The wines he was accustomed to 'were served in a very fine sommelier glass, with a very beautiful stem' and the first time he ordered water at a restaurant, he 'was given a child's tumbler'. Dr Mascha has since designed a new water flute for use in restaurants, and has set up a training academy for water sommeliers. His collection includes a mist bottled from the air in Tasmania, and a 15,000-year-old water from the last Ice Age, mined in the Czech Republic and flecked with gold. The 'clever' restaurants are now 'realising that alcohol is dying' and are beginning to offer extensive water lists, he says. When he brings a fine water to a party, 'no one is interested in wine anymore'. What does Prof Nutt make of that? 'If you want to go to the pub and drink water, then you can, but it's a different experience,' he says. He acknowledges that alcohol might never go the way of cigarettes, locked up behind counters and stamped with labels warning that it kills. He still enjoys a glass of wine with his wife, and their daughter owns a (traditional) bar in London. He simply wants to 'give people more choice'. His hope is that Alcarelle will replace 25 per cent of all alcohol consumption worldwide by 2050. If that gets people to stop drinking altogether, he would 'save more lives than you would by curing cancer', Prof Nutt believes. 'I want that to be my legacy.'


LBCI
22-07-2025
- Politics
- LBCI
Kremlin doesn't expect 'miraculous breakthroughs' from Ukraine talks
The Kremlin on Tuesday said it wasn't expecting "miraculous breakthroughs" from upcoming talks with Ukraine, the third such round of negotiations in recent months. "We don't have any reason to hope for some miraculous breakthroughs" at the talks on Wednesday, spokesman Dmitry Peskov told reporters during a briefing. AFP


CNN
21-07-2025
- Science
- CNN
China brain tech rivals Musk's Neuralink
CNN gains rare access to a brain research lab in Beijing, where scientists are working to improve brain technology. Western experts say that while breakthroughs have traditionally been led in the US, China has the edge on commercializing these technologies. CNN's Kristie Lu Stout reports.


CNN
21-07-2025
- Science
- CNN
China brain tech rivals Musk's Neuralink
CNN gains rare access to a brain research lab in Beijing, where scientists are working to improve brain technology. Western experts say that while breakthroughs have traditionally been led in the US, China has the edge on commercializing these technologies. CNN's Kristie Lu Stout reports.