Latest news with #leapfrogging

Irish Times

16 hours ago

Business
Irish Times

The Irish Times view on AI vehicles: no replacement for MetroLink

The concept of 'leapfrogging' over outdated infrastructure is an attractive one. Many countries, particularly in sub-Saharan Africa, have bypassed the need for costly and complex landline networks by adopting mobile phone technology directly. Similarly, in parts of India and Southeast Asia, mobile banking has flourished where traditional banking infrastructure was non-existent. China's rapid development of high-speed rail is another example, enabled in part by the underdevelopment of its previous rail network. So when one of Ireland's foremost business leaders suggests Dublin could leapfrog its antiquated transport infrastructure, the idea is worth considering. Dermot Desmond proposes that instead of building MetroLink, the city should prepare for the arrival of autonomous vehicles, which he believes will meet its transport needs. Unfortunately, Desmond's suggestion misses the point entirely. Set aside for a moment the widespread scepticism over the supposed timeline for the adoption of this technology. And accept that AI-guided transport will significantly reduce car ownership and optimise traffic flow. In theory, AI could indeed transform urban mobility and promote more sustainable transport. But Desmond's vision overlooks Dublin's fundamental issue: the sheer number of vehicles clogging surface streets. His proposal fails to consider the central purpose of the MetroLink project, which is not simply to improve access to the airport but to create a network that enables higher-density development in order to help meet Ireland's climate commitments and address the glaring need for sustainable urban living. READ MORE Instead, Desmond's futuristic vision risks creating a new kind of gridlock, as fleets of robotaxis – some empty, some occupied – converge on the congested Victorian bottlenecks around the city. As Colombian politician Enrique Peñalosa stated: 'A developed country is not a place where the poor have cars, it's where the rich use public transportation'. Whether driverless or not, that is the vision that remains to be achieved.

DeepSeek: A Paradigm Shift, What It Means For Humanity

Forbes

3 days ago

Business
Forbes

DeepSeek: A Paradigm Shift, What It Means For Humanity

The whale that is DeepSeek was invisible prior to Jan 20th 2025. Then the Blue Whale breaches to the whole world's sight on Jan 20th. That body slam sent shockwaves around the world. The release of DeepSeek-R1 immediately cratered the market cap of several hardware and software companies which were buoyed by what investors thought was American exceptionalism. Withholding the latest chips and AI Intellectual Property from China was thought to be the strategy to follow. Except it was wrong. Such is the stuff that leapfrogging is made of. Especially for manufacturing and design powerhouse such as China. Ironically, the latest models from DeepSeek are free to use. They even run it on their servers for free. Development of general purpose large language models through scaling of parameters and training data led to many breakthroughs. The release of ChatGPT-3.5 and 4.0 in 2022-23 unleashed the general purpose potential of AI to the general public. This approach also increased costs tremendously as compute and data demands spurred bigger and better processors. In late 2023 and 2024 and even now, the construction of power hungry data centers were thought to be the only way to improve the performance of the models. Limiting access to computing and the latest chips was thought to restrain China as a source of these powerful models. With DeepSeek that paradigm was shifted. Companies like Nvidia whose stock was heavily affected by the announcement have since recovered and thrived. The lessons were lost on global markets. The worst may yet to come as the companies buoyed by the rising of AI and its use are brought down to earth by a combination of new methods and the lessening of compute needed to do training as well as inference. Sunk costs and the costs of switching with their own powerful economic adherents prevent a longer term view and lock the American AI in their paths. Success breeds complacency and adherence to the model that produced success. In AI, a rapidly developing field, getting stuck on algorithms, process and practice is deadly. DeepSeek showed that just piling on computing and data does not make for exponential progress. This is a lesson from many fields, that is often ignored with an overused but wrong dictum 'This time it is different.' Innovation follows familiar patterns; slowly then rapidly. Efficiency The costs of training and running DeepSeek are much lower than for other models. The ratio in a recent presentation showed $6M for DeepSeek/ versus $600M for Llama (the open source model from Meta). One hundredth the cost. The costs for other models, including ChatGPT, are even more. The cost savings are a result of implementing DeepSeek's own discoveries in reinforcement learning and training using distillation. Further, the model is very efficient in generating Chinese language. As of three months ago, a large number of Chinese companies had joined the AI revolution by subscribing to DeepSeek. As the national champion, the government industrial policy supports DeepSeek. RL as a training method was invented in the University of Amherst. The recipients of the 2024 ACM Turing award, Andrew Barto and Richard Sutton were the inventors of the classic reinforcement learning techniques. For LLMs and other large models, such an approach falls under supervised learning. The model is refined by feedback, classically from humans, called RLHF (Reinforcement Learning with Human Feedback). This is called supervised fine- tuning. Humans are the supervisors. The paper released by the creators of DeepSeek R1 goes into detail on the way that they modified RL. Anything that involves humans in the loop at scale requires a lot of money. Removing the human in the loop makes training cheaper. A version of the model is used to fine-tune the other. In other words, one model functions as the supervisor and the other is trained. The arrival of new companies with models such as MiniMax-M1 epitomizes this shift even more. Such techniques will overtake models which are created using conventional scaling. DeepSeek-R1 was effective through its evolution utilizing multiple strategies. A combination of novel methods based on existing techniques made the training and inference efficient in time and resources. More details can be found in this article. In short, all aspects of the creation and running of large language models were changed, enhanced or reworked for cost and time efficiency. MiniMax-M1 MiniMax-M1 claims to have chopped the cost of DeepSeek-R1 training by 90%. They trained their model for a cost of $500K. Contrast this to the $6M cost for DeepSeek-R1 and $600M for LLaMa. There have been doubts cast on the numbers publicized by both DeepSeek and MiniMax. Efficiencies have been through further refining RL with what is called lightning attention. This is mostly for deterministic problems such as mathematical and logical reasoning and long context problems such as coding. Minimax is also available through HuggingFace the open source AI host. Privacy There is concern that DeepSeek is harvesting private data for its own use. This phenomenon is rife in the world of AI and social media in general. What makes the sharing of private data with DeepSeek or other private companies is the fact that they will be used to refine the models. In the case of DeepSeek or other China based companies, there is a fear of data reaching the Chinese government. Private AI companies, even those in the United States do the same, except they will share that data with the US government if they are forced by law. At this juncture, such a scenario is more disquieting. The fourth amendment will fall by the wayside, if the government can search not only our persons and our homes, but our minds without a warrant. To read more about the risks of DeepSeek, read this analysis from Hidden Layer. Since Hidden Layer's business model is based on these kinds of analysis, it is best to look closely at the analysis and compare with their work on other open models. Open Source AI Models Open Source International (OSI) has a definition of Open Source AI. It is 1.0 right now, subject to revision. Like the Open Source definition for software, it allows users to use, observe, modify and distribute without any restrictions. AI models depend a lot on their training data. AI use involves inference, consuming resources. The expenditure on training is separate from the expense of inference. In the classic definition of open source software the source code is available to any user to use, observe, modify and distribute. In a strict interpretation of AI open-source, the source code should include data used to train the model. However this may not be practical, nor is it part of the OSI definition of Open Source AI. This is drastically different from the OSI guidance for open source software. The other difference is the observability of the model weights and hyperparameters. During the learning phase model weights are refined. Model weights embody the model in its current form, crystallizing all the training that the model has undergone. Hyperparameters control the initial configuration of the learning setup. In an open model, model weights and model parameters are meant to be open. Open Source AI models can be called open weights models. Many models from China are open weights models, including Qwen (From AliBababa). This competition has also forced OpenAI to release an open weight model. This is the gpt-oss base model with two variants. The Future We have not delved into the technology behind the creation of multi-modal prompts and multi-modal generation. By multi-modal, we mean not only text, but images, audio as well as video. MiniMax as well as DeepSeek have these capabilities. It is clear that limiting access to hardware and know-how cannot hold true innovation back. Such constraints also make for multiple paradigm shifts, making AI cheaper to develop with lower hardware and power resources, creating democratized and decentralized future where we could fine-tune and run models on commodity hardware. These developments give us hope that we will be able to control and bend these capabilities to help humanity rather than harm ourselves.

Latest news with #leapfrogging

The Irish Times view on AI vehicles: no replacement for MetroLink

DeepSeek: A Paradigm Shift, What It Means For Humanity

Get Started Now: Download the App