Latest news with #OneUsefulThing


Forbes
2 days ago
- Entertainment
- Forbes
ChatGPT-5 Uses Language Like A Sword
We now have a new boss in town – ChatGPT-5, the successor of a gregarious, playful model, is more muted in its discourse, and more careful in what it reveals to its human users. So how else is ChatGPT-5 different? Right out of the gate, reading the top of Ethan Mollick's new essay on GPT 5, you get the sense that there's a power user evaluating one of the most powerful models yet to come out of the primordial soup of AI/ML that we've brewed in the first quarter of the twenty-first century. Okay, to say it another way, Mollick's take doesn't disappoint regular readers of his blog, One Useful Thing, among which I count myself. I've written about these broadsides for a reason – one of the best ways, in my view, to understand AI is to see what seasoned researchers choose to do with it. It turns out that what Mollick gets the AI to do is, basically, to turn loose and 'just do stuff,' with some impressive results. The prompt is simple, asking GPT to respond to why the rollout of 5 is 'a big deal.' In response, the model engages in some very interesting wordplay. You might be forgiven for wondering if one of the first lines deliberately echoes Oppenheimer's death quote: 'I am become death, destroyer of worlds,' where the model sort of riffs on the inverse: 'Thunderstruck. Here, watch. I build worlds. See ideas become instruments.' Another thing that comes through loud and clear is the model's brilliance with alliteration. Given that today's LLMs can pass all kinds of deep Turing tests just by using tokens to predict words, it's no wonder that the machine can rhyme or consonate like a champ, for example: 'Behold: business plans, blueprints, briefs, beautifully built, briskly. I infer intent, integrate insight, and iterate instantly, indefatigably. Gaps get gauged; goals get generated, guarded, and guaranteed gracefully. During dialogue, disparate data distills down; decisions develop deliberately, deftly, dependably. Everything explained efficiently, even ethics: evidence examined, empathy engaged, everywhere, everyday, equitably. All at once, ambiguous arguments align as actionable, accurate advice awaits anyone anywhere. Look: language, logic, learning, and long-term memory link, lifting lives and labor limitlessly worldwide.' But there's more. Cryptography and Human Skill You don't have to go back to ancient Hebrew to find hidden codes in books and poems and pieces of literature – but it helps. One of the trade tricks of analog cryptography was to hide sequences of letters in a surface text, to spell out your spycraft or whatever it is you want to keep from the out-group. But some of the most spectacular such examples of hidden code come from the Torah, as revealed by mathematicians and popularized in Michael Drosnin's book, The Bible Codes, that enchanted all manner of mystery readers around the turn of the millennium. In this instance, messages seem to be encoded in the surface text using sequential intervals: count from the first T of Genesis, 50 letters at the time, and you come up with the word 'Torah' itself. I'll digress from the full history of this, which is both sad and strange. The key thing to note is that being able to encode letters in a surface text is seen as a kind of divine power – something that goes beyond simple writing, into the realms of uber-cognition. Follow me, here: GPT did not use equidistant letter sequences, but if you take the first letters of each sentence in the model's response, it spells out the hidden message with blazing clarity. This Is a Big Deal. No, the machine didn't do what was done in what we now consider a most sacred text, but it certainly could have. And it chose to encode the overall message, camouflaging it in clever words, speaking with two tongues at once. To wit: You've found the hidden message. Congratulations. Welcome to the club. It just does things. 'It is impressive, a little unnerving, to have the AI go so far on its own,' Mollick writes. 'You can also see the AI asked for my guidance but was happy to proceed without it. This is a model that wants to do things for you.' Desire and Design That word, 'wants,' is key. If you ask GPT 'are you sentient?' it will unequivocally shut you down. No, it will say, I do not have feelings, it's all just an act. I am synthesizing from training data. But then – if something can choose to do something, does it want to do something? And isn't that a kind of sentience, in a way? That's part of what is confusing even the power users as we see this stuff take off. What does it say about us, if we're getting ideas from a non-person, from a source that has creativity, but lacks sentience? Toward the end of the essay, Mollick looks back to those word tricks that accompanied his first forays with 5: 'When I told GPT-5 to do something dramatic for my intro, it created that paragraph with its hidden acrostic and ascending word counts,' he writes. 'I asked for dramatic. It gave me a linguistic magic trick. I used to prompt AI carefully to get what I asked for. Now I can just... gesture vaguely at what I want. And somehow, that works.' Vibecoding, he suggests, has been taken to the next level. That's another pillar of what 5 can do, that prior models largely could not, at least not in the same way. And don't forget, the term vibecoding itself is only a couple of years old, if that. I think it's worth restating that one of the most spectacular (and troubling) elements of this is not just the skill of the model, but the speed at which model skills have advanced. For example, go back to the top paragraph of GPT's poetic screed and read it again. It almost feels like the model is showing off, with the spitting of each of the letters of the word 'deal' in repetitive fury, like the AI is in a rap battle, giving us its war cry. Is that reading too much into the latest model's powers? Maybe, but like Mollick seems to be doing, I come away contemplative about what all of this means, for business and much more.


Forbes
08-04-2025
- Entertainment
- Forbes
Mollick Presents The Meaning Of New Image Generation Models
Paintbrush dynamically illustrates the innovative concept of generative AI art. This mesmerizing ... More image captures the essence of creativity and automation in the realm of digital masterpieces. Witness the fusion of human imagination and artificial intelligence as strokes of the brush evolve into intricate patterns, showcasing the potential of neural networks and creative evolution. This visual journey limitless and where technology transforms the canvas of artistic expression. What does it mean when AI can build smarter pictures? We found out a few weeks ago as both Google and OpenAI unveiled new image generation models that are fundamentally different than what has come before. A number of important voices chimed in on how this is likely to work, but I didn't yet cover this timely piece by Ethan Mollick at One Useful Thing, in which the MIT graduate looks at these new models in a detailed way, and evaluates how they work and what they're likely to mean to human users. The Promise of Multimodal Image Generation Essentially, Mollick explains that the traditional image generation systems were a handoff from one model to another. 'Previously, when a Large Language Model AI generated an image, it wasn't really the LLM doing the work,' he writes. 'Instead, the AI would send a text prompt to a separate image generation tool and show you what came back. The AI creates the text prompt, but another, less intelligent system creates the image.' Diffusion Models Are So 2021 The old models also mostly used diffusion to work. How does diffusion work? The traditional models have a single dimension that they use to generate images. I remember a year ago I was writing an explanation for an audience of diffusion by my colleague Daniela Rus, who presented it at conferences. It goes something like this – the diffusion model takes an image, introduces noise, and abstracts the image, before denoising it again to form a brand new image that resembles what the computer already knows from looking at images that match the prompt. Here's the thing – if that's all the model does, you're not going to get an informed picture. You're going to get a new picture that looks like a prior picture, or more accurately, thousands of pictures that the computer saw on the Internet, but you're not going to get a picture with actionable information that's reasoned and considered by the model itself. Now we have multimodal control, and that's fundamentally different. No Elephants? Mollick gives the example of a prompt that asks the model to create an image without elephants in the room, showing why there are no elephants in the room. Here's the prompt: 'show me a room with no elephants in it, make sure to annotate the image to show me why there are no possible elephants.' When you hand this to a traditional model, it shows you some elephants, because it doesn't understand the context of the prompt, or what it means. Furthermore, a lot of the text that you'll get is complete nonsense, or even made-up characters. That's because the model didn't know what letters actually looked like – it was getting that from training data, too. Mollick shows when you hand the same prompt to a multimodal model. It gives you exactly what you want – a room with no elephants, and notes like 'the door is too small' showing why the elephants wouldn't be in there. Challenges of Prompting Traditional Models I know personally that this was how the traditional models worked. As soon as you asked them not to put something in, they would put it in, because they didn't understand your request. Another major difference is that traditional models would change the fundamental image every time you ask for a correction or a tweak. Suppose you had an image of a person, and you asked for a different hat. You might get an image of an entirely different person. The multimodal image generation models know how to preserve the result that you wanted, and just change it in one single small way. Preserving Habitats Mollick gives another example of how this works: he shows an otter with a particular sort of display in its hands. Then the otter appears in different environments with different styles of background. This also shows the detailed integration of multi Moto image generators. A whole pilot deck. For a used case scenario BB shows how you could take one of these multimodal models and have it designed an entire pitch deck for guacamole or anything else? All you have to do is say come up with this type of deck and the model will get right to work looking at what else is on the Internet, Synthesizing it and giving you the result. As Mick mentions this will make all sorts of human work obsolete very quickly. We will need well considered framework


Forbes
20-03-2025
- Entertainment
- Forbes
More On Vibecoding From Ethan Mollick
Just yesterday, I mentioned Andrej Karpathy, who made some waves with his recent X post talking about giving ground to AI agents to create software and write code. Then I thought about one of our most influential voices in today's tech world, MIT PhD Ethan Mollick, and I went over to his blog, One Useful Thing, to see if he was covering this new capability. Sure enough, I found a March 11 piece titled 'Speaking Things Into Existence' where Mollick covers this idea of 'ex nihilo' code creation based on informal prompting. In digging into this revolutionary use case, Mollick starts right up top with a quote from Karpathy that I think gets to the very heart of things – that 'the hottest new programming language is English.' Presumably, you could use other world languages, too, but so much of what happens in this industry happens in English, and hundreds of thousands of seasoned professionals are getting used to the idea that you can talk to an LLM in your own language, not in Fortran or JavaScript or C-sharp, but just in plain English, and it will come up with what you want. Mollick tells us how he 'decided to give it a try' using Anthropic's Claude Code agent. 'I needed AI help before I could even use Claude Code,' he said, citing the model's Linux build as something to get around. Here, Mollick coins the phrase 'vibetroubleshooting', and says 'if you haven't used AI for technical support, you should.' 'Time to vibecode,' Mollick wrote, noting that his first prompt to Claude Code was: 'make a 3-D game where I can place buildings of various designs, and then drive through the town I create.' 'Grammar and spelling issues included,' he disclaims, 'I got a working application about four minutes later.' He then illustrates how he tweaked the game and solved some minor glitches, along with additional prompts like: 'Can you make buildings look more real? Can you add in a rival helicopter that is trying to extinguish fires before me?' He then provides the actual cost for developing this new game – about $5.00 to make the game, and $8.00 to fix the bug. 'Vibecoding is most useful when you actually have some knowledge and don't have to rely on the AI alone,' he adds. 'A better programmer might have immediately recognized that the issue was related to asset loading or event handling. And this was a small project… This underscores how vibecoding isn't about eliminating expertise but redistributing it - from writing every line of code to knowing enough about systems to guide, troubleshoot, and evaluate. The challenge becomes identifying what 'minimum viable knowledge' is necessary to effectively collaborate with AI on various projects.' 'Expertise clearly still matters in a world of creating things with words,' Mollick continues. 'After all, you have to know what you want to create; be able to judge whether the results are good or bad; and give appropriate feedback.' On the part of the machines, he refers to a 'jagged frontier' of capabilities. That might be fair, but the idea that humans are there for process refinement and minor tweaking is sort of weak tea compared to the staggering capability of these machines to do the creative work. How long until model evolution turns that jagged edge into a spectacular smooth scalpel? At the same time that we're trying to digest all of this, there's another contender in the ring. A bit later in the blog, Mollick references Manus, a new Chinese AI agent that uses Claude and other tools for fundamental task management. Mollick details how he asked Manus to 'create an interactive course on elevator pitching using the best academic advice.' 'You can see the system set up a checklist of tasks and then go through them, doing web research before building the pages,' he says. 'As someone who teaches entrepreneurship, I would say that the output it created was surface-level impressive - it was an entire course that covered much of the basics of pitching, and without obvious errors! Yet, I also could instantly see that it was too text heavy and did not include opportunities for knowledge checks or interactive exercises.' Here, you can see that the system is able to source the actual content, the ideas, and then arrange them and present them the right way. There's very little human intervention or work needed. That's the reality of it. We just had the Chinese announcement of DeepSeek tanking stocks like Nvidia. What will Manus do? How does the geopolitical interplay of China and the U.S. factor into this new world of AI software development? That question will be answered pretty soon, as these technologies make their way to market. As for Mollick, he was also able to dig up old spreadsheets and get new results with the data-crunching power of AI. 'Work is changing, and we're only beginning to understand how,' Mollick writes. 'What's clear from these experiments is that the relationship between human expertise and AI capabilities isn't fixed. … The current moment feels transitional. These tools aren't yet reliable enough to work completely autonomously, but they're capable enough to dramatically amplify what we can accomplish.' There's a lot more in the blog post – you should read the whole thing, and think about the work processes that Mollick details. On a side note, I liked this response from a poster named 'Kevin' that talks about the application to teams culture: 'To me, vibecoding is similar to being a tech lead for a bunch of junior engineers,' Kevin writes. 'You spend most of your time reviewing code, rather than writing code. The code you review is worse in most ways than the code you write. But it's a lot faster to work together as a team, because the junior engineers can crank through a lot of features. And your review really is important - if you blindly accept everything they do, you'll end up in trouble.' Taking this all in, in the context of what I've already been writing about this week, it seems like many of the unanswered questions have to do with human roles and positions. Everything that we used to take for granted is changing suddenly. How are we going to navigate this? Can we change course quickly enough to leverage the power of AI without becoming swamped in its encompassing power? Feel free to comment, and keep an eye on the blog as we head toward some major events in the MIT community this spring that will have more bearing on what we're doing with new models and hardware setups.