Mollick Presents The Meaning Of New Image Generation Models

08-04-2025

Paintbrush dynamically illustrates the innovative concept of generative AI art. This mesmerizing ... More image captures the essence of creativity and automation in the realm of digital masterpieces. Witness the fusion of human imagination and artificial intelligence as strokes of the brush evolve into intricate patterns, showcasing the potential of neural networks and creative evolution. This visual journey limitless and where technology transforms the canvas of artistic expression.
What does it mean when AI can build smarter pictures?
We found out a few weeks ago as both Google and OpenAI unveiled new image generation models that are fundamentally different than what has come before.
A number of important voices chimed in on how this is likely to work, but I didn't yet cover this timely piece by Ethan Mollick at One Useful Thing, in which the MIT graduate looks at these new models in a detailed way, and evaluates how they work and what they're likely to mean to human users.
The Promise of Multimodal Image Generation
Essentially, Mollick explains that the traditional image generation systems were a handoff from one model to another.
'Previously, when a Large Language Model AI generated an image, it wasn't really the LLM doing the work,' he writes. 'Instead, the AI would send a text prompt to a separate image generation tool and show you what came back. The AI creates the text prompt, but another, less intelligent system creates the image.'
Diffusion Models Are So 2021
The old models also mostly used diffusion to work.
How does diffusion work?
The traditional models have a single dimension that they use to generate images.
I remember a year ago I was writing an explanation for an audience of diffusion by my colleague Daniela Rus, who presented it at conferences.
It goes something like this – the diffusion model takes an image, introduces noise, and abstracts the image, before denoising it again to form a brand new image that resembles what the computer already knows from looking at images that match the prompt.
Here's the thing – if that's all the model does, you're not going to get an informed picture. You're going to get a new picture that looks like a prior picture, or more accurately, thousands of pictures that the computer saw on the Internet, but you're not going to get a picture with actionable information that's reasoned and considered by the model itself.
Now we have multimodal control, and that's fundamentally different.
No Elephants?
Mollick gives the example of a prompt that asks the model to create an image without elephants in the room, showing why there are no elephants in the room.
Here's the prompt: 'show me a room with no elephants in it, make sure to annotate the image to show me why there are no possible elephants.'
When you hand this to a traditional model, it shows you some elephants, because it doesn't understand the context of the prompt, or what it means. Furthermore, a lot of the text that you'll get is complete nonsense, or even made-up characters. That's because the model didn't know what letters actually looked like – it was getting that from training data, too.
Mollick shows when you hand the same prompt to a multimodal model. It gives you exactly what you want – a room with no elephants, and notes like 'the door is too small' showing why the elephants wouldn't be in there.
Challenges of Prompting Traditional Models
I know personally that this was how the traditional models worked. As soon as you asked them not to put something in, they would put it in, because they didn't understand your request.
Another major difference is that traditional models would change the fundamental image every time you ask for a correction or a tweak.
Suppose you had an image of a person, and you asked for a different hat. You might get an image of an entirely different person.
The multimodal image generation models know how to preserve the result that you wanted, and just change it in one single small way.
Preserving Habitats
Mollick gives another example of how this works: he shows an otter with a particular sort of display in its hands. Then the otter appears in different environments with different styles of background.
This also shows the detailed integration of multi Moto image generators.
A whole pilot deck.
For a used case scenario BB shows how you could take one of these multimodal models and have it designed an entire pitch deck for guacamole or anything else?
All you have to do is say come up with this type of deck and the model will get right to work looking at what else is on the Internet, Synthesizing it and giving you the result.
As Mick mentions this will make all sorts of human work obsolete very quickly.
We will need well considered framework

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Pixel 10 leak reveals Google may be entering a new market, but with one notable absence

Android Authority

10 minutes ago

Android Authority

Pixel 10 leak reveals Google may be entering a new market, but with one notable absence

TL;DR The prices of the Pixel 10 series and Pixel Watch have leaked for Mexico and Europe. The price is said to start at €899 for Europe and MXN19,999 in Mexico. Google may not sell the Pixel 10 Pro Fold in Mexico. There's not much left to leak about the Pixel 10 series at this point. We're only days away from the launch, and everything, from the specs to the wallpapers, has pretty much been revealed. But somehow, we're still finding new nuggets of information, like the prices for Europe and Mexico. Don't want to miss the best from Android Authority? Set us as a preferred source in Google Search to support us and make sure you never miss our latest exclusive reports, expert analysis, and much more. For the first time, Google will be officially launching a Pixel phone in Mexico. It will launch the Pixel Watch 4 in the country as well. If you're wondering how much these products will cost, a leak from Android Headlines has the answer. Google Pixel 10: MXN19,999 MXN19,999 Google Pixel 10 Pro: MXN25,999 MXN25,999 Google Pixel 10 Pro XL: MXN30,999 MXN30,999 Google Pixel Watch 4 (41mm): MXN7,999 MXN7,999 Google Pixel Watch 4 (45mm): MXN8,999 Unfortunately, this leak doesn't include the prices of the different configurations. It also doesn't include the price of the Pixel 10 Pro Fold, but there's a reason for that. According to the outlet, Google won't be selling the next-gen foldable in Mexico. And it's reported that the LTE versions of the Pixel Watch 4 won't be available either. As for Europe, all of Google's lineup will be available for purchase. Here's how the pricing breaks down: Pixel 10 series Google Pixel 10 (128GB): €899 €899 Google Pixel 10 (256GB): €999 €999 Google Pixel 10 Pro (128GB): €1,099 €1,099 Google Pixel 10 Pro (256GB): €1,199 €1,199 Google Pixel 10 Pro (512GB): €1,329 €1,329 Google Pixel 10 Pro (1TB): €1,589 €1,589 Google Pixel 10 Pro XL (256GB): €1,299 €1,299 Google Pixel 10 Pro XL (512GB): €1,429 €1,429 Google Pixel 10 Pro XL (1TB): €1,689 €1,689 Google Pixel 10 Pro Fold (256GB): €1,899 €1,899 Google Pixel 10 Pro Fold (512GB): €2,029 €2,029 Google Pixel 10 Pro Fold (1TB): €2,289 Pixel Watch 4 Google Pixel Watch 4 (41mm, Wi-Fi): €349 €349 Google Pixel Watch 4 (41mm, LTE): €399 €399 Google Pixel Watch 4 (45mm, Wi-Fi): €449 €449 Google Pixel Watch 4 (45mm, LTE): €499 This leak arrives on the heels of a last-minute leak unintentionally committed by a retailer. That leak revealed a new 67W dual-port power adapter, as well as prices for some of the Pixel 10's accessories. Follow

OpenAI Staffers to Sell $6 Billion Worth of Shares

Bloomberg

11 minutes ago

Bloomberg

OpenAI Staffers to Sell $6 Billion Worth of Shares

Bloomberg's Matt Miller discusses plans by current and former OpenAI employees to sell $6 billion worth of shares to an investor group that includes SoftBank. Plus, investors react to US government plans to take a stake in Intel. And global competition in the electric vehicle space picks up. (Source: Bloomberg)

Google AI Pioneer Employee Says to Stay Away From AI PhDs

Entrepreneur

40 minutes ago

Entrepreneur

Google AI Pioneer Employee Says to Stay Away From AI PhDs

Jad Tarifi, who founded Google's first generative AI team and has a Ph.D. from the University of Florida, wouldn't recommend starting a program now. AI researchers are in high demand, with some offered billion-dollar compensation packages from Meta amid the ongoing AI talent wars. However, one AI pioneer, Jad Tarifi, who founded Google's first generative AI team after obtaining a Ph.D. in AI, would not recommend higher study to break into the field. In a new interview with Business Insider, Tarifi, 42, predicted that within the five to seven years it takes to obtain a Ph.D., most of AI's problems will be solved. "Even things like applying AI to robotics will be solved by then," Tarifi told BI. Related: AI Is Going to 'Replace Everybody' in Several Fields, According to the 'Godfather of AI.' Here's Who He Says Should Be 'Terrified.' Tarifi explained that obtaining a Ph.D. was only for "weird people" who were "obsessed" with a certain field because higher education required "a lot of pain" and at least five years of their lives. He recommended staying away from the Ph.D. route altogether or choosing to specialize in a subfield of AI that is still in its early stages, like AI for biology. Tarifi received a Ph.D. in 2012 from the University of Florida, where he worked on an AI theory that combined principles from neuroscience, geometry, and machine learning, according to his LinkedIn. He then joined Google, where he became a tech lead and manager for nearly a decade, working on models for Google's generative AI projects. Tarifi is now the founder and CEO of Integral AI, a startup that focuses on creating AI agents to act autonomously on behalf of users. Related: These 3 Professions Are Most Likely to Vanish in the Next 20 Years Due to AI, According to a New Report In the BI interview, Tarifi also warned prospective students from completing degrees in law and medicine, arguing that the information in these programs was "outdated" and memorization-based. Tarifi isn't the first person to warn students away from higher degrees. Venture capitalist Victor Lazarte said earlier this year that AI is "fully replacing people" in the legal profession. He predicted that AI would take over entry-level legal positions usually filled by recent law school graduates within the next three years. Join top CEOs, founders and operators at the Level Up conference to unlock strategies for scaling your business, boosting revenue and building sustainable success.