logo
Israel Says Iran Is Hacking Private Security Cameras

Israel Says Iran Is Hacking Private Security Cameras

Bloomberg20-06-2025
Iran is tapping into private security cameras in Israel to gather real-time intelligence about its adversary, exposing a recurrent problem with the devices that has emerged in other global conflicts. Bloomberg's Michael Shepard reports. (Source: Bloomberg)
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Journalist killed in Israeli strike feared his own assassination - as IDF claims he was a 'terrorist'
Journalist killed in Israeli strike feared his own assassination - as IDF claims he was a 'terrorist'

Yahoo

time8 minutes ago

  • Yahoo

Journalist killed in Israeli strike feared his own assassination - as IDF claims he was a 'terrorist'

Five Al Jazeera journalists have been killed in an Israeli strike in Gaza - including a reporter who feared he was going to be assassinated. Anas al Sharif died alongside four of his colleagues from the network: Mohammed Qreiqeh, Ibrahim Zaher, Mohammed Noufal and Moamen Aliwa. The Committee to Protect Journalists (CPJ) had recently expressed "grave" concerns about al Sharif's safety, and claimed he was "being targeted by an Israeli military smear campaign". Israel Defence Forces confirmed the strike - and alleged al Sharif was a "terrorist" who "served as the head of a terrorist cell in the Hamas terrorist organisation". It claimed he was "responsible for advancing rocket attacks against Israeli civilians and IDF troops". Last month, the reporter had said he lived with "the feeling that I could be bombed and martyred at any moment" because his coverage of Israel's operations "harms them and damages their image in the world". As of 5 August, at least 186 journalists and media workers have been killed in Gaza - but foreign reporters have been barred from covering the war independently since the latest conflict began in 2023. The Hamas-run government has described Israel's killing of these five Al Jazeera journalists as "brutal and heinous". A statement added: "The assassination was premeditated and deliberate, following a deliberate, direct targeting of the journalists' tent near al Shifa Hospital in Gaza City. "The targeting of journalists and media institutions by Israeli aircraft is a full-fledged war crime aimed at silencing the truth and obliterating the traces of genocidal crimes." Read more: Following Anas al Sharif's death, a post described as his "last will and testament" was posted on X. It read: "If these words of mine reach you, know that Israel has succeeded in killing me and silencing my voice." The 28-year-old added that he laments being able to fulfil his dream of seeing his son and daughter grow up - and alleged he had witnessed children "crushed by thousands of tonnes of Israeli bombs and missiles". "Do not forget Gaza ... and do not forget me in your prayers for forgiveness and acceptance," he wrote. The CPJ reported that his father was killed by an Israeli airstrike on their family home in December 2023 after the journalist received telephone threats from Israeli army officers instructing him to cease coverage. Israel shut down the Al Jazeera television network in the country in May last year.

Israeli strike kills multiple journalists in Gaza, including prominent Al Jazeera reporters, network says
Israeli strike kills multiple journalists in Gaza, including prominent Al Jazeera reporters, network says

CNN

time10 minutes ago

  • CNN

Israeli strike kills multiple journalists in Gaza, including prominent Al Jazeera reporters, network says

An Israeli strike in Gaza City late Sunday night killed six journalists, according to Al-Shifa hospital, including four from Al Jazeera. The Israeli military said they targeted and killed reporter Anas Al-Sharif after accusing him of leading a Hamas cell. Mohammed Qreiqeh, another prominent Al Jazeera journalist in Gaza, was also killed in the strike, the network said. 'The order to kill Anas Al-Sharif, one of Gaza's bravest journalists, along with his colleagues, is a desperate attempt to silence voices ahead of the occupation of Gaza,' Al Jazeera said in a statement after the attack. In the minutes before he was killed, Al-Sharif said on social media, 'If this madness does not end, Gaza will be reduced to ruins, its people's voices silenced, their faces erased — and history will remember you as silent witnesses to a genocide you chose not to stop.' Al-Sharif was in a tent with other journalists near the entrance to the Al-Shifa Hospital when he was killed, according to hospital director Dr. Mohammad Abu Salmiya. The strike killed at least seven people, Salmiya said. The Israel Defense Forces (IDF) has accused Al-Sharif of leading a Hamas cell in Gaza that 'advanced rocket attacks against Israeli civilians and IDF troops.' The IDF had previously shown documents it claimed showed 'unequivocal proof' of Al-Sharif's ties to Hamas. 'The IDF had previously disclosed intelligence information and many documents found in the Gaza Strip, confirming his military affiliation to Hamas,' the military said in a statement after the strike. Last month, after the IDF accused Al-Sharif of being a member of Hamas, he responded in a message on social media. 'I reaffirm: I, Anas Al-Sharif, am a journalist with no political affiliations. My only mission is to report the truth from the ground — as it is, without bias,' he wrote. 'At a time when a deadly famine is ravaging Gaza, speaking the truth has become, in the eyes of the occupation, a threat.' The Committee to Protest Journalists (CPJ) said in July they were 'gravely worried' for Al-Sharif's safety and that the journalist feared for his life after he was the target of 'an Israeli military smear campaign, which he believes is a precursor to his assassination.' Since the beginning of the war nearly two years ago, CPJ says 186 journalists have been killed in Israeli strikes. The United Nations also called Israel's charges against Al-Sharif 'online attacks and unfounded accusations.' 'I am deeply alarmed by repeated threats and accusations of the Israeli army against Anas Al-Sharif, the last surviving journalist of Al Jazeera in northern Gaza,' said Irene Khan, the UN Special Rapporteur on freedom of expression, two weeks ago. Al-Sharif, who was married and had two children, had prepared a final message in the event of his death which was shared by his colleagues. 'I urge you not to be silenced by chains, nor to be hindered by borders, and to be bridges towards the liberation of the land and its people, until the sun of dignity and freedom shines upon our occupied homeland,' Al-Sharif wrote. This is a developing story and will be updated.

Will Reinforcement Learning Take Us To AGI?
Will Reinforcement Learning Take Us To AGI?

Forbes

time10 minutes ago

  • Forbes

Will Reinforcement Learning Take Us To AGI?

There aren't many truly new ideas in artificial intelligence. More often, breakthroughs in AI happen when concepts that have existed for years suddenly take on new power because underlying technology inputs—in particular, raw computing power—finally catch up to unlock those concepts' full potential. Famously, Geoff Hinton and a small group of collaborators devoted themselves tirelessly to neural networks starting in the early 1970s. For decades, the technology didn't really work and the outside world paid little attention. It was not until the early 2010s—thanks to the arrival of sufficiently powerful Nvidia GPUs and internet-scale training data—that the potential of neural networks was finally unleashed for all to see. In 2024, more than half a century after he began working on neural networks, Hinton was awarded the Nobel Prize for pioneering the field of modern AI. Reinforcement learning has followed a similar arc. Richard Sutton and Andrew Barto, the fathers of modern reinforcement learning, laid down the foundations of the field starting in the 1970s. Even before Sutton and Barto began their work, the basic principles underlying reinforcement learning—in short, learning by trial and error based on positive and negative feedback—had been developed by behavioral psychologists and animal researchers going back to the early twentieth century. Yet in just the past year, advances in reinforcement learning (RL) have taken on newfound importance and urgency in the world of AI. It has become increasingly clear that the next leap in AI capabilities will be driven by RL. If artificial general intelligence (AGI) is in fact around the corner, reinforcement learning will play a central role in getting us there. Just a few years ago, when ChatGPT's launch ushered in the era of generative AI, almost no one would have predicted this. Deep questions remain unanswered about reinforcement learning's capabilities and its limits. No field in AI is moving more quickly today than RL. It has never been more important to understand this technology, its history and its Learning 101 The basic principles of reinforcement learning have remained consistent since Sutton and Barto established the field in the 1970s. The essence of RL is to learn by interacting with the world and seeing what happens. It is a universal and foundational form of learning; every human and animal does it. In the context of artificial intelligence, a reinforcement learning system consists of an agent interacting with an environment. RL agents are not given direct instructions or answers by humans; instead, they learn through trial and error. When an agent takes an action in an environment, it receives a reward signal from the environment, indicating that the action produced either a positive or a negative outcome. The agent's goal is to adjust its behavior to maximize positive rewards and minimize negative rewards over time. How does the agent decide which actions to take? Every agent acts according to a policy, which can be understood as the formula or calculus that determines the agent's action based on the particular state of the environment. A policy can be a simple set of rules, or even pure randomness, or it can be represented by a far more complex system, like a deep neural network. One final concept that is important to understand in RL, closely related to the reward signal, is the value function. The value function is the agent's estimate of how favorable a given state of the environment will be (that is, how many positive and negative rewards it will lead to) over the long run. Whereas reward signals are immediate pieces of feedback that come from the environment based on current conditions, the value function is the agent's own learned estimate of how things will play out in the long term. The entire purpose of value functions is to estimate reward signals, but unlike reward signals, value functions enable agents to reason and plan over longer time horizons. For instance, value functions can incentivize actions even when they lead to negative near-term rewards because the long-term benefit is estimated to be worth it. When RL agents learn, they do so in one of three ways: by updating their policy, updating their value function, or updating both together. A brief example will help make these concepts concrete. Imagine applying reinforcement learning to the game of chess. In this case, the agent is an AI chess player. The environment is the chess board, with any given configuration of chess pieces representing a state of that environment. The agent's policy is the function (whether a simple set of rules, or a decision tree, or a neural network, or something else) that determines which move to make based on the current board state. The reward signal is simple: positive when the agent wins a game, negative when it loses a game. The agent's value function is its learned estimate of how favorable or unfavorable any given board position is—that is, how likely the position is to lead to a win or a loss. As the agent plays more games, strategies that lead to wins will be positively reinforced and strategies that lead to losses will be negatively reinforced via updates to the agent's policy and value function. Gradually, the AI system will become a stronger chess player. In the twenty-first century, one organization has championed and advanced the field of reinforcement learning more than any other: DeepMind. Founded in 2010 as a startup devoted to solving artificial intelligence and then acquired by Google in 2014 for ~$600 million, London-based DeepMind made a big early bet on reinforcement learning as the most promising path forward in AI. And the bet paid off. The second half of the 2010s were triumphant years for the field of reinforcement learning. In 2016, DeepMind's AlphaGo became the first AI system to defeat a human world champion at the ancient Chinese game of Go, a feat that many AI experts had believed was impossible. In 2017, DeepMind debuted AlphaZero, which taught itself Go, chess and Japanese chess entirely via self-play and bested every other AI and human competitor in those games. And in 2019, DeepMind unveiled AlphaStar, which mastered the video game StarCraft—an even more complex environment than Go given the vast action space, imperfect information, numerous agents and real-time gameplay. AlphaGo, AlphaZero, AlphaStar—reinforcement learning powered each of these landmark achievements. As the 2010s drew to a close, RL seemed poised to dominate the coming generation of artificial intelligence breakthroughs, with DeepMind leading the way. But that's not what happened. Right around that time, a new AI paradigm unexpectedly burst into the spotlight: self-supervised learning for autoregressive language models. In 2019, a small nonprofit research lab named OpenAI released a model named GPT-2 that demonstrated surprisingly powerful general-purpose language capabilities. The following summer, OpenAI debuted GPT-3, whose astonishing abilities represented a massive leap in performance from GPT-2 and took the AI world by storm. In 2022 came ChatGPT. In short order, every AI organization in the world reoriented its research focus to prioritize large language models and generative AI. These large language models (LLMs) were based on the transformer architecture and made possible by a strategy of aggressive scaling. They were trained on unlabeled datasets that were bigger than any previous AI training data corpus—essentially the entire internet—and were scaled up to unprecedented model sizes. (GPT-2 was considered mind-bogglingly large at 1.5 billion parameters; one year later, GPT-3 debuted at 175 billion parameters.) Reinforcement learning fell out of fashion for half a decade. A widely repeated narrative during the early 2020s was that DeepMind had seriously misread technology trends by committing itself to reinforcement learning and missing the boat on generative AI. Yet today, reinforcement learning has reemerged as the hottest field within AI. What happened? In short, AI researchers discovered that applying reinforcement learning to generative AI models was a killer combination. Starting with a base LLM and then applying reinforcement learning on top of it meant that, for the first time, RL could natively operate with the gift of language and broad knowledge about the world. Pretrained foundation models represented a powerful base on which RL could work its magic. The results have been dazzling—and we are just getting Meets LLMs What does it mean, exactly, to combine reinforcement learning with large language models? A key insight to start with is that the core concepts of RL can be mapped directly and elegantly to the world of LLMs. In this mapping, the LLM itself is the agent. The environment is the full digital context in which the LLM is operating, including the prompts it is presented with, its context window, and any tools and external information it has access to. The model's weights represent the policy: they determine how the agent acts when presented with any particular state of the environment. Acting, in this context, means generating tokens. What about the reward signal and the value function? Defining a reward signal for LLMs is where things get interesting and complicated. It is this topic, more than any other, that will determine how far RL can take us on the path to superintelligence. The first major application of RL to LLMs was reinforcement learning from human feedback, or RLHF. The frontiers of AI research have since advanced to more cutting-edge methods of combining RL and LLMs, but RLHF represents an important step on the journey, and it provides a concrete illustration of the concept of reward signals for LLMs. RLHF was invented by DeepMind and OpenAI researchers back in 2017. (As a side note, given today's competitive and closed research environment, it is remarkable to remember that OpenAI and DeepMind used to conduct and publish foundational research together.) RLHF's true coming-out party, though, was ChatGPT. When ChatGPT debuted in November 2022, the underlying AI model on which it was based was not new; it had already been publicly available for many months. The reason that ChatGPT became an overnight success was that it was approachable, easy to talk to, helpful, good at following directions. The technology that made this possible was RLHF. In a nutshell, RLHF is a method to adapt LLMs' style and tone to be consistent with human-expressed preferences, whatever those preferences may be. RLHF is most often used to make LLMs 'helpful, harmless and honest,' but it can equally be used to make them more flirtatious, or rude, or sarcastic, or progressive, or conservative. How does RLHF work? The key ingredient in RLHF is 'preference data' generated by human subjects. Specifically, humans are asked to consider two responses from the model for a given prompt and to select which one of the two responses they prefer. This pairwise preference data is used to train a separate model, known as the reward model, which learns to produce a numerical rating of how desirable or undesirable any given output from the main model is. This is where RL comes in. Now that we have a reward signal, an RL algorithm can be used to fine-tune the main model—in other words, the RL agent—so that it generates responses that maximize the reward model's scores. In this way, the main model comes to incorporate the style and values reflected in the human-generated preference data. Circling back to reward signals and LLMs: in the case of RLHF, as we have seen, the reward signal comes directly from humans and human-generated preference data, which is then distilled into a reward model. What if we want to use RL to give LLMs powerful new capabilities beyond simply adhering to human preferences?The Next Frontier The most important development in AI over the past year has been language models' improved ability to engage in reasoning. What exactly does it mean for an AI model to 'reason'? Unlike first-generation LLMs, which respond to prompts using next-token prediction with no planning or reflection, reasoning models spend time thinking before producing a response. These models think by generating 'chains of thought,' enabling them to systematically break down a given task into smaller steps and then work through each step in order to arrive at a well-thought-through answer. They also know how and when to use external tools—like a calculator, a code interpreter or the internet—to help solve problems. The world's first reasoning model, OpenAI's o1, debuted less than a year ago. A few months later, China-based DeepSeek captured world headlines when it released its own reasoning model, R1, that was near parity with o1, fully open and trained using far less compute. The secret sauce that gives AI models the ability to reason is reinforcement learning—specifically, an approach to RL known as reinforcement learning from verifiable rewards (RLVR). Like RLHF, RLVR entails taking a base model and fine-tuning it using RL. But the source of the reward signal, and therefore the types of new capabilities that the AI gains, are quite different. As its name suggests, RLVR improves AI models by training them on problems whose answers can be objectively verified—most commonly, math or coding tasks. First, a model is presented with such a task—say, a challenging math problem—and prompted to generate a chain of thought in order to solve the problem. The final answer that the model produces is then formally determined to be either correct or incorrect. (If it's a math question, the final answer can be run through a calculator or a more complex symbolic math engine; if it's a coding task, the model's code can be executed in a sandboxed environment.) Because we now have a reward signal—positive if the final answer is correct, negative if it is incorrect—RL can be used to positively reinforce the types of chains of thought that lead to correct answers and to discourage those that lead to incorrect answers. The end result is a model that is far more effective at reasoning: that is, at accurately working through complex multi-step problems and landing on the correct solution. This new generation of reasoning models has demonstrated astonishing capabilities in math competitions like the International Math Olympiad and on logical tests like the ARC-AGI benchmark. So—is AGI right around the corner? Not necessarily. A few big-picture questions about reinforcement learning and language models remain unanswered and loom large. These questions inspire lively debate and widely varying opinions in the world of artificial intelligence today. Their answers will determine how powerful AI gets in the coming months.A Few Big Unanswered Questions Today's cutting-edge RL methods rely on problems whose answers can be objectively verified as either right or wrong. Unsurprisingly, then, RL has proven exceptional at producing AI systems that are world-class at math, coding, logic puzzles and standardized tests. But what about the many problems in the world that don't have easily verifiable answers? In a provocative essay titled 'The Problem With Reasoners', Aidan McLaughlin elegantly articulates this point: 'Remember that reasoning models use RL, RL works best in domains with clear/frequent reward, and most domains lack clear/frequent reward.' McLaughlin argues that most domains that humans actually care about are not easily verifiable, and we will therefore have little success using RL to make AI superhuman at them: for instance, giving career advice, managing a team, understanding social trends, writing original poetry, investing in startups. A few counterarguments to this critique are worth considering. The first centers on the concepts of transfer learning and generalizability. Transfer learning is the idea that models trained in one area can transfer those learnings to improve in other areas. Proponents of transfer learning in RL argue that, even if reasoning models are trained only on math and coding problems, this will endow them with broad-based reasoning skills that will generalize beyond those domains and enhance their ability to tackle all sorts of cognitive tasks. 'Learning to think in a structured way, breaking topics down into smaller subtopics, understanding cause and effect, tracing the connections between different ideas—these skills should be broadly helpful across problem spaces,' said Dhruv Batra, cofounder/chief scientist at Yutori and former senior AI researcher at Meta. 'This is not so different from how we approach education for humans: we teach kids basic numeracy and literacy in the hopes of creating a generally well-informed and well-reasoning population.' Put more strongly: if you can solve math, you can solve anything. Anything that can be done with a computer, after all, ultimately boils down to math. It is an intriguing hypothesis. But to date, there is no conclusive evidence that RL endows LLMs with reasoning capabilities that generalize beyond easily verifiable domains like math and coding. It is no coincidence that the most important advances in AI in recent months—both from a research and a commercial perspective—have occurred in precisely these two fields. If RL can only give AI models superhuman powers in domains that can be easily verified, this represents a serious limit to how far RL can advance the frontiers of AI's capabilities. AI systems that can write code or do mathematics as well as or better than humans are undoubtedly valuable. But true general-purpose intelligence consists of much more than this. Let us consider another counterpoint on this topic, though: what if verification systems can in fact be built for many (or even all) domains, even when those domains are not as clearly deterministic and checkable as a math problem? Might it be possible to develop a verification system that can reliably determine whether a novel, or a government policy, or a piece of career advice, is 'good' or 'successful' and therefore should be positively reinforced? This line of thinking quickly leads us into borderline philosophical considerations. In many fields, determining the 'goodness' or 'badness' of a given outcome would seem to involve value judgments that are irreducibly subjective, whether on ethical or aesthetic grounds. For instance, is it possible to determine that one public policy outcome (say, reducing the federal deficit) is objectively superior to another (say, expanding a certain social welfare program)? Is it possible to objectively identify that a painting or a poem is or is not 'good'? What makes art 'good'? Is beauty not, after all, in the eye of the beholder? Certain domains simply do not possess a 'ground truth' to learn from, but rather only differing values and tradeoffs to be weighed. Even in such domains, though, another possible approach exists. What if we could train an AI via many examples to instinctively identify 'good' and 'bad' outcomes, even if we can't formally define them, and then have that AI serve as our verifier? As Julien Launay, CEO/cofounder of RL startup Adaptive ML, put it: 'In bridging the gap from verifiable to non-verifiable domains, we are essentially looking for a compiler for natural language…but we already have built this compiler: that's what large language models are.' This approach is often referred to as reinforcement learning from AI feedback (RLAIF) or 'LLM-as-a-Judge.' Some researchers believe it is the key to making verification possible across more domains. But it is not clear how far LLM-as-a-Judge can take us. The reason that reinforcement learning from verifiable rewards has led to such incisive reasoning capabilities in LLMs in the first place is that it relies on formal verification methods: correct and incorrect answers exist to be discovered and learned. LLM-as-a-Judge seems to bring us back to a regime more closely resembling RLHF, whereby AI models can be fine-tuned to internalize whatever preferences and value judgments are contained in the training data, arbitrary though they may be. This merely punts the problem of verifying subjective domains to the training data, where it may remain as unsolvable as ever. We can say this much for sure: to date, neither OpenAI nor Anthropic nor any other frontier lab has debuted an RL-based system with superhuman capabilities in writing novels, or advising governments, or starting companies, or any other activity that lacks obvious verifiability. This doesn't mean that the frontier labs are not making progress on the problem. Indeed, just last month, leading OpenAI researcher Noam Brown shared on X: 'We developed new techniques that make LLMs a lot better at hard-to-verify tasks.' Rumors have even begun to circulate that OpenAI has developed a so-called 'universal verifier,' which can provide an accurate reward signal in any domain. It is hard to imagine how such a universal verifier would work; no concrete details have been shared publicly. Time will tell how powerful these new techniques is important to remember that we are still in the earliest innings of the reinforcement learning era in generative AI. We have just begun to scale RL. The total amount of compute and training data devoted to reinforcement learning remains modest compared to the level of resources spent on pretraining foundation models. This chart from a recent OpenAI presentation speaks volumes: At this very moment, AI organizations are preparing to deploy vast sums to scale up their reinforcement learning efforts as quickly as they can. As the chart above depicts, RL is about to transition from a relatively minor component of AI training budgets to the main focus. What does it mean to scale RL? 'Perhaps the most important ingredient when scaling RL is the environments—in other words, the settings in which you unleash the AI to explore and learn,' said Stanford AI researcher Andy Zhang. 'In addition to sheer quantity of environments, we need higher-quality environments, especially as model capabilities improve. This will require thoughtful design and implementation of environments to ensure diversity and goldilocks difficulty and to avoid reward hacking and broken tasks.' When xAI debuted its new frontier model Grok 4 last month, it announced that it had devoted 'over an order of magnitude more compute' to reinforcement learning than it had with previous models. We have many more orders of magnitude to go. Today's RL-powered models, while powerful, face shortcomings. The unsolved challenge of difficult-to-verify domains, discussed above, is one. Another critique is known as elicitation: the hypothesis that reinforcement learning doesn't actually endow AI models with greater intelligence but rather just elicits capabilities that the base model already possessed. Yet another obstacle that RL faces is its inherent sample inefficiency compared to other AI paradigms: RL agents must do a tremendous amount of work to receive a single bit of feedback. This 'reward sparsity' has made RL impracticable to deploy in many contexts. It is possible that scale will be a tidal wave that washes all of these concerns away. If there is one principle that has defined frontier AI in recent years, after all, it is this: nothing matters more than scale. When OpenAI scaled from GPT-2 to GPT-3 to GPT-4 between 2019 and 2023, the models' performance gains and emergent capabilities were astonishing, far exceeding the community's expectations. At every step, skeptics identified shortcomings and failure modes with these models, claiming that they revealed fundamental weaknesses in the technology paradigm and predicting that progress would soon hit a wall. Instead, the next generation of models would blow past these shortcomings, advancing the frontier by leaps and bounds and demonstrating new capabilities that critics had previously argued were impossible. The world's leading AI players are betting that a similar pattern will play out with reinforcement learning. If recent history is any guide, it is a good bet to make. But it is important to remember that AI 'scaling laws'—which predict that AI performance increases as data, compute and model size increase—are not actually laws in any sense of that word. They are empirical observations that for a time proved reliable and predictive for pretraining language models and that have been preliminarily demonstrated in other data modalities. There is no formal guarantee that scaling laws will always hold in AI, nor how long they will last, nor how steep their slope will be. The truth is that no one knows for sure what will happen when we massively scale up RL. But we are all about to find tuned for our follow-up article on this topic—or feel free to reach out directly to discuss!Looking Forward Reinforcement learning represents a compelling approach to building machine intelligence for one profound reason: it is not bound by human competence or imagination. Training an AI model on vast troves of labeled data (supervised learning) will make the model exceptional at understanding those labels, but its knowledge will be limited to the annotated data that humans have prepared. Training an AI model on the entire internet (self-supervised learning) will make the model exceptional at understanding the totality of humanity's existing knowledge, but it is not clear that this will enable it to generate novel insights that go beyond what humans have already put forth. Reinforcement learning faces no such ceiling. It does not take its cues from existing human data. An RL agent learns for itself, from first principles, through first-hand experience. AlphaGo's 'Move 37' serves as the archetypal example here. In one of its matches against human world champion Lee Sedol, AlphaGo played a move that violated thousands of years of accumulated human wisdom about Go strategy. Most observers assumed it was a miscue. Instead, Move 37 proved to be a brilliant play that gave AlphaGo a decisive advantage over Sedol. The move taught humanity something new about the game of Go. It has forever changed the way that human experts play the game. The ultimate promise of artificial intelligence is not simply to replicate human intelligence. Rather, it is to unlock new forms of intelligence that are radically different from our own—forms of intelligence that can come up with ideas that we never would have come up with, make discoveries that we never would have made, help us see the world in previously unimaginable ways. We have yet to see a 'Move 37 moment' in the world of generative AI. It may be a matter of weeks or months—or it may never happen. Watch this space.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store