Latest news with #OneUsefulThing


Forbes
08-04-2025
- Entertainment
- Forbes
Mollick Presents The Meaning Of New Image Generation Models
Paintbrush dynamically illustrates the innovative concept of generative AI art. This mesmerizing ... More image captures the essence of creativity and automation in the realm of digital masterpieces. Witness the fusion of human imagination and artificial intelligence as strokes of the brush evolve into intricate patterns, showcasing the potential of neural networks and creative evolution. This visual journey limitless and where technology transforms the canvas of artistic expression. What does it mean when AI can build smarter pictures? We found out a few weeks ago as both Google and OpenAI unveiled new image generation models that are fundamentally different than what has come before. A number of important voices chimed in on how this is likely to work, but I didn't yet cover this timely piece by Ethan Mollick at One Useful Thing, in which the MIT graduate looks at these new models in a detailed way, and evaluates how they work and what they're likely to mean to human users. The Promise of Multimodal Image Generation Essentially, Mollick explains that the traditional image generation systems were a handoff from one model to another. 'Previously, when a Large Language Model AI generated an image, it wasn't really the LLM doing the work,' he writes. 'Instead, the AI would send a text prompt to a separate image generation tool and show you what came back. The AI creates the text prompt, but another, less intelligent system creates the image.' Diffusion Models Are So 2021 The old models also mostly used diffusion to work. How does diffusion work? The traditional models have a single dimension that they use to generate images. I remember a year ago I was writing an explanation for an audience of diffusion by my colleague Daniela Rus, who presented it at conferences. It goes something like this – the diffusion model takes an image, introduces noise, and abstracts the image, before denoising it again to form a brand new image that resembles what the computer already knows from looking at images that match the prompt. Here's the thing – if that's all the model does, you're not going to get an informed picture. You're going to get a new picture that looks like a prior picture, or more accurately, thousands of pictures that the computer saw on the Internet, but you're not going to get a picture with actionable information that's reasoned and considered by the model itself. Now we have multimodal control, and that's fundamentally different. No Elephants? Mollick gives the example of a prompt that asks the model to create an image without elephants in the room, showing why there are no elephants in the room. Here's the prompt: 'show me a room with no elephants in it, make sure to annotate the image to show me why there are no possible elephants.' When you hand this to a traditional model, it shows you some elephants, because it doesn't understand the context of the prompt, or what it means. Furthermore, a lot of the text that you'll get is complete nonsense, or even made-up characters. That's because the model didn't know what letters actually looked like – it was getting that from training data, too. Mollick shows when you hand the same prompt to a multimodal model. It gives you exactly what you want – a room with no elephants, and notes like 'the door is too small' showing why the elephants wouldn't be in there. Challenges of Prompting Traditional Models I know personally that this was how the traditional models worked. As soon as you asked them not to put something in, they would put it in, because they didn't understand your request. Another major difference is that traditional models would change the fundamental image every time you ask for a correction or a tweak. Suppose you had an image of a person, and you asked for a different hat. You might get an image of an entirely different person. The multimodal image generation models know how to preserve the result that you wanted, and just change it in one single small way. Preserving Habitats Mollick gives another example of how this works: he shows an otter with a particular sort of display in its hands. Then the otter appears in different environments with different styles of background. This also shows the detailed integration of multi Moto image generators. A whole pilot deck. For a used case scenario BB shows how you could take one of these multimodal models and have it designed an entire pitch deck for guacamole or anything else? All you have to do is say come up with this type of deck and the model will get right to work looking at what else is on the Internet, Synthesizing it and giving you the result. As Mick mentions this will make all sorts of human work obsolete very quickly. We will need well considered framework


Forbes
20-03-2025
- Entertainment
- Forbes
More On Vibecoding From Ethan Mollick
Just yesterday, I mentioned Andrej Karpathy, who made some waves with his recent X post talking about giving ground to AI agents to create software and write code. Then I thought about one of our most influential voices in today's tech world, MIT PhD Ethan Mollick, and I went over to his blog, One Useful Thing, to see if he was covering this new capability. Sure enough, I found a March 11 piece titled 'Speaking Things Into Existence' where Mollick covers this idea of 'ex nihilo' code creation based on informal prompting. In digging into this revolutionary use case, Mollick starts right up top with a quote from Karpathy that I think gets to the very heart of things – that 'the hottest new programming language is English.' Presumably, you could use other world languages, too, but so much of what happens in this industry happens in English, and hundreds of thousands of seasoned professionals are getting used to the idea that you can talk to an LLM in your own language, not in Fortran or JavaScript or C-sharp, but just in plain English, and it will come up with what you want. Mollick tells us how he 'decided to give it a try' using Anthropic's Claude Code agent. 'I needed AI help before I could even use Claude Code,' he said, citing the model's Linux build as something to get around. Here, Mollick coins the phrase 'vibetroubleshooting', and says 'if you haven't used AI for technical support, you should.' 'Time to vibecode,' Mollick wrote, noting that his first prompt to Claude Code was: 'make a 3-D game where I can place buildings of various designs, and then drive through the town I create.' 'Grammar and spelling issues included,' he disclaims, 'I got a working application about four minutes later.' He then illustrates how he tweaked the game and solved some minor glitches, along with additional prompts like: 'Can you make buildings look more real? Can you add in a rival helicopter that is trying to extinguish fires before me?' He then provides the actual cost for developing this new game – about $5.00 to make the game, and $8.00 to fix the bug. 'Vibecoding is most useful when you actually have some knowledge and don't have to rely on the AI alone,' he adds. 'A better programmer might have immediately recognized that the issue was related to asset loading or event handling. And this was a small project… This underscores how vibecoding isn't about eliminating expertise but redistributing it - from writing every line of code to knowing enough about systems to guide, troubleshoot, and evaluate. The challenge becomes identifying what 'minimum viable knowledge' is necessary to effectively collaborate with AI on various projects.' 'Expertise clearly still matters in a world of creating things with words,' Mollick continues. 'After all, you have to know what you want to create; be able to judge whether the results are good or bad; and give appropriate feedback.' On the part of the machines, he refers to a 'jagged frontier' of capabilities. That might be fair, but the idea that humans are there for process refinement and minor tweaking is sort of weak tea compared to the staggering capability of these machines to do the creative work. How long until model evolution turns that jagged edge into a spectacular smooth scalpel? At the same time that we're trying to digest all of this, there's another contender in the ring. A bit later in the blog, Mollick references Manus, a new Chinese AI agent that uses Claude and other tools for fundamental task management. Mollick details how he asked Manus to 'create an interactive course on elevator pitching using the best academic advice.' 'You can see the system set up a checklist of tasks and then go through them, doing web research before building the pages,' he says. 'As someone who teaches entrepreneurship, I would say that the output it created was surface-level impressive - it was an entire course that covered much of the basics of pitching, and without obvious errors! Yet, I also could instantly see that it was too text heavy and did not include opportunities for knowledge checks or interactive exercises.' Here, you can see that the system is able to source the actual content, the ideas, and then arrange them and present them the right way. There's very little human intervention or work needed. That's the reality of it. We just had the Chinese announcement of DeepSeek tanking stocks like Nvidia. What will Manus do? How does the geopolitical interplay of China and the U.S. factor into this new world of AI software development? That question will be answered pretty soon, as these technologies make their way to market. As for Mollick, he was also able to dig up old spreadsheets and get new results with the data-crunching power of AI. 'Work is changing, and we're only beginning to understand how,' Mollick writes. 'What's clear from these experiments is that the relationship between human expertise and AI capabilities isn't fixed. … The current moment feels transitional. These tools aren't yet reliable enough to work completely autonomously, but they're capable enough to dramatically amplify what we can accomplish.' There's a lot more in the blog post – you should read the whole thing, and think about the work processes that Mollick details. On a side note, I liked this response from a poster named 'Kevin' that talks about the application to teams culture: 'To me, vibecoding is similar to being a tech lead for a bunch of junior engineers,' Kevin writes. 'You spend most of your time reviewing code, rather than writing code. The code you review is worse in most ways than the code you write. But it's a lot faster to work together as a team, because the junior engineers can crank through a lot of features. And your review really is important - if you blindly accept everything they do, you'll end up in trouble.' Taking this all in, in the context of what I've already been writing about this week, it seems like many of the unanswered questions have to do with human roles and positions. Everything that we used to take for granted is changing suddenly. How are we going to navigate this? Can we change course quickly enough to leverage the power of AI without becoming swamped in its encompassing power? Feel free to comment, and keep an eye on the blog as we head toward some major events in the MIT community this spring that will have more bearing on what we're doing with new models and hardware setups.