The 'Metal in the Microwave' Problem...

An illustrated essay on whether AI can illustrate a children's book. And a little bit about whether it should.

Nov 09, 2023

You probably don’t care whether artificial intelligence can illustrate a children’s book — but you still might find this interesting. AI is everywhere these days, and it’s apparently going to take all our jobs, and then replace us, in the way that Netflix took over from the high street video store. However, unless you’re a Californian tech-bro, it can be hard to get a grip on exactly what it is. That’s where I hope this might help.

This essay (story? feature? Whatever it is) comes from my research into whether AI image generators could illustrate a children’s story I wrote. The AI side of it is super high-tech and I only have the dimmest grasp of how it works. But the children’s story part couldn’t be more accessible. We all know, instinctively, if a story works or doesn’t work, and we can all say whether we like a picture or not. Moreover we’ve all been kids, so we probably have a better idea of what they like, compared to a box with flashing lights — no matter how advanced those lights are. And so, since I’m able to include the pictures made by the AI in this essay, you can see what you think, and hopefully it comes together to give you a window into the brave new world of AI that’s both informative and entertaining. If not, nothing lost, and I’ll soon be replaced by a robot anyway.

A little bit of context. A few years back I wrote and published a children’s book, hiring a professional illustrator to create the images. It was a partial success – it has excellent reviews on Amazon – but we’re not even close to making back the money we spent on it (I guess actually that makes it an abject failure..?) Anyway, I wanted to try again, because I had other stories written, and every time I looked at them, they seemed really good – and because I’m really bad at making business decisions.

But since the illustrations were the most expensive part of the project before, I wanted to see if AI could do them this time around. I was a bit antsy about the idea, for reasons we’ll come on to, but I was curious to see if it was even an option.

I began by taking a story I’d written called Katie and the CamperCopter. You don’t know it, because it hasn’t been published yet, but I’ll summarise the story here, so you can see if the AI images fit.

Katie’s CamperCopter is a kid’s picture book about a little girl who’s excited about going camping with her family. It all goes wrong when they end up squashed onto a busy campsite, next to the toilets and in the pouring rain. Her parents want to go home, but Katie has a better idea. She modifies their car using her amazing mechanic skills, and utilising the crazy gear they have loaded onto the roofrack (think skis, surfboards, tennis rackets, a guitar, tents, airbed, a rotary clothes line, a drone, portable rockets etc…) and she turns it into an incredible ‘CamperCopter’ (part camper, part helicopter). They use it to fly up above the rain and into the clouds. Here they set up camp, and spend the day skiing on cloudy slopes, surfing on waves of fog and generally having an awesome time.

I’m not going to tell you the rest of story, but there’s a twist, and some drama, and then a really nice ending. And although I say so myself, it works really well as a story.

But to work as an actual book, clearly the illustrations are critical. The heart of the story is the CamperCopter. I envisaged it as this wacky, fantastical machine, recognisably made of lots of camping bits, and with springs and sprockets and nuts and bolts sticking out everywhere. There’s no way (without years of training, and possibly a DNA transfer) that I could draw it, so I was keen to see what the AI could do.

Should you want to commission an artist to illustrate a children’s book of this size (32 pages or 12 ‘spreads’) you might expect to pay somewhere between a couple of thousand up to about ten thousand pounds. Though for more experienced, or really well known artists, that could obviously rise substantially.

So can we do the job for free? For the tech-curious, I used DALL-E 3, via ChatGTP4, but if you don’t know what that means, don’t worry. I don’t really know either, and you certainly don’t need to know to read this article.

Page One

The text for page one of the story is this:

Katie and her family are going camping. She’s very excited, not least because she has a packet of Mollo’s Mallows — the world’s tastiest marshmallows — which taste better than anything in the world when toasted on an open campfire.

And I made some illustrators notes, which say this:

Illustrator Note: Show Katie outside her suburban house, with a big smile, holding marshmallows and looking eagerly at a car stuffed crazy-full of camping gear, and with surfboards, skis, ski poles, etc… piled precariously high on the roof.

Obviously if using a human illustrator I would expect to discuss in more detail how Katie might look, what kind of car they might have, and the general look and feel of the images. But what happens if you just cut and paste the above in the AI? Well, it gives you four options, that look like this:

When I saw these I thought wow! Perhaps none of them are quite right, but they’re not bad, and it took all of a minute to produce... Remember, if we used a human artist, it would take days to get here, and cost quite a bit. If you look a little bit deeper you might notice each of the images is a bit weird in places, but for a first draft they’re impressive. Now the way the AI works is, if you don’t like any of the first images it gives you, you can simply hit reload, and try again. And I did — a few times — with similar results. But time is money, and I don’t want to waste yours, so let’s move on with the story.

Page Two

Here's the words for page 2, and the illustrator’s notes below:

But when they get to the campsite, all the best pitches have been taken by expensive campervans. The only space is next to the toilets. And the weather isn’t exactly great for tents…

Illustrator Note: Depict a campsite full of expensive camper vans, with miserable owners, looking out of their windows at the pouring rain. Perhaps a fat man with a moustache and a toilet roll is walking by too close to Katie’s tent, which is sagging under the weight of the water.

If you ask the AI to give you images to suit this page? It only gave me three this time, but here they are:

Again, they’re not too bad, perhaps not quite as good as the images from the first page, but a decent starting point. However, in all but one it seemed to have forgotten to put Katie and her family in the image, and where it did she looks different. Also, the car is missing, and this is important because it’s about to be transformed into an amazing CamperCopter. Clearly there’s work to do, but let’s move just a little further into the story before we trouble ourselves with triffling details.

Page three shows Katie standing in the pouring rain and thinking what to do, while her mum and dad argue about going home in the background. I decided to skip that page, because I was impatient to see what the computer could do with the actual CamperCopter – because the look of this is going to make or break the book. So here’s the text and illustrator’s notes for page 4:

Page Four

Katie rolls up her sleeves, opens the bonnet on dad’s camper car and gets to work.

First she rewires the touch-screen to add some new controls, then she connects the battery to the skis and surfboards on the roof, converting it into…

A CamperCopter!

Illustrator Note: Katie as mechanic, sleeves rolled up and covered in oil. The amazing camper-copter is taking shape. It’s a fantastical blend of car and helicopter, built from camping paraphernalia, bits of rockets, surfboards, skis, pipes and tubes and wires and springs and washing lines. It’s completely over the top, but packed with funny details.

I want to work on the words a bit for the final book, but here’s what the AI generates at the moment:

Again, at first glance — wow! I really think these are little short of fantastic. Perhaps I’m biased because I’m seeing images that accompany a story that, until this moment, I could only see in my head. But I also showed them to my kids and they thought they were brilliant too (Ok, that could be genetic…) However, they did also say that – like the images before – none of them were quite right, and there were some weird bits, so perhaps now is the time to pause the story, find out what’s going on, and see if we can fine tune things.

First up. The weird bits. When you look a little bit closer, they’re everywhere in the images – in the last set they’re quite clear – the odd alien-plant like look of the helicopter in image one, the weird little wheels in image two. But the effect is most obviously evident in the text. A Der’s kad’s camp—opter – what does that mean? And where does it come from?

Next the characterisation. Clearly it hasn’t figured out a consistent look for Katie, or any of the other characters. I think she looks pretty cool in some of the images, and frankly awful in some of the others. And although I haven’t said so above, I did spend a lot of time asking the AI to keep the characters looking the same, but it was really bad at it. For example, here’s a few early images of Katie, which it insisted were the same person:

Either she’s a master of disguise, or I’m going mad, because these look like completely different people to me. After a full-blown argument about whether the below, obviously-different Katies, even had the same eye colour, I threatened to turn my computer off at the wall if it didn’t tell me the truth.

I’m not saying this did the trick, but it did finally manage to understand what I wanted - basically the same person on page one, as page two, and so on to the very end of the book, albeit in a range of poses and doing different things. It’s hard to imagine even needing to mention this to a human illustrator, but the AI seemed bizarrely casual about this point. This is what I got with my best efforts, spending several hours trying to get the computer to put the same character into several different scenes:

Now some of these look — to me — like the same person, almost. But other’s don’t. She seems older in some, younger in others, more cartoonish in some, and a bit more lifelike in others. Put all together (each of the images above was generated separately) it’s more like looking at a family of closely related people, rather than the same girl doing different things. And the style of the images also changes, even though I also spent ages explaining how I wanted that to be consistent too. So what’s going on? How can it be so hard, for a machine that can produce such incredible, instant art work in the first place, to just use the same character more than once?

I did some intense research at this point to find out (I asked chatGTP to explain it to me).

The problem, apparently, is that the AI isn’t able to ‘see’, and doesn’t have a memory for images it has created. Even though it’s just made an image of Katie, it can’t access that image, copy her and put her into a new image. I’ll say that again, because it’s important. It has no memory for what it’s just created. It’s like a goldfish, or the character named Tom in the movie Fifty First Dates, who has no short term memory, and who’s only line, over and over again, is ‘Hi, I’m Tom’.

At this point I must apologise to any goldfish reading this. There’s now lots of research demonstrating that, far from the ten-second-memory of popular myth, they’re able to retain memories for months, recognise human faces and even read articles on the internet like this one. So, sorry my fishy friends.

But if a goldfish has a memory, why not an AI? Good question, and lucky for you I harbour a secret desire to be a popular-science writer. And thus shall explain it, making the technical and complex seem both simple and intuitive…

You generate an AI image by writing a text prompt, for example: ‘give me an image of an 8-year-old girl with curly brown hair and freckles’. It then takes this, reflects upon the thousands of millions of images it has been trained on, and mixes it with a random element (called a seed). It then it gives you a possible version of the picture you asked for, but not the only version. You didn’t specify everything in the image - the colour of her eyes for example, or her expression. But the image has to make a decision on these points (because when it gives you a picture, you can see whether she has blue eyes or brown.) The point is it chooses these randomly, or sort of randomly (it has a memory of billions of pictures, and uses that to put things together that are usually put together) In fact, much of the image will be ‘random stuff’ you didn’t specifically ask for, but which are suitable for the general scene that you did ask for. Still with me? Good. The problem comes if you want a second picture, because all it can do is repeat the process, and then the random element does it’s magic a second time. Only this time everything looks a bit different, because that’s what random means.

OK, but still, why? Why can’t it just have a memory? Wouldn’t that be easier? My research uncovered (again, ChatGTP explained to me) that AI Image Generators have no memory (so-called Stateless) because that ensures that many people can use it at once, and that there’s was no risk of one person’s data being mixed with another. Fair enough, and clearly important. But still… a little odd. It still seems a massive and rather obvious limitation to how useful the software is, at least for illustrating a whole book. Surely there must be other ways to safeguard and scale? I pressed for more details, and then something strange happened. The AI got increasingly vague and uncomfortable, shifting the conversation to ‘workarounds’ I might like to try. I persevered, asking if, in the near future, it would get a ‘memory’ (become a ‘stated’ system) and it said maybe, but it was remarkably unconvincing.

I’ve written before about AI’s really odd habit of lying. It seems so strange writing this that I’ve checked it several times whether or not this can be right, and it is — it’s called hallucinating. But it’s very obvious when you use AIs that it really does happen. They lie frequently, often seeming to tell you what they (seem to) think you want to hear, rather than what you know to be the truth. And over time, you get a feel for when it’s happening. And now I was getting a strong hunch that the AI wasn’t being straight with me about the issue of not having a memory.

Consider when microwave ovens first appeared. You may be too young to remember this happy, optimistic period of history (lucky you) but there was once a time when microwave ovens were the hot new ticket in the tech world. Silicon Valley bros would skateboard to work with one on their shoulders — OK, that’s a lie, but microwaves really could soften butter in seconds, instead of everyone having to wait until the summer, which was revolutionary at the time. Yet nobody normal knew how they worked. They were sort-of magic in a similar way to how AI is today. And there was one thing microwaves couldn’t do. You couldn’t heat metal in a microwave.

This was OK — few people ate metal, even back then. But I still remember a general sense that this limitation would be short-lived. The clever people who made microwaves would soon tweak the magic inside them, to let us heat up all the metal we wanted. However, thirty-odd years later, I’m still waiting to heat up my metal in the microwave.

It turned out that not-putting-metal into the microwave was somehow fundamental to how they work. They’re great for softening up butter, but slip in a piece of aluminium foil, and they spark angrily, flash purple flames and very quickly blow up. Every bit of magic has its anti-magic, and metal was kryptonite for microwaves.

Could it be that the AI not being able to ‘see’ an image it’s just created, and needing to generate it anew every time, is similar to the metal in the microwave problem? I’m nowhere near expert enough to know (that ought to be clear from the above), but I can try to use the tech — as it currently exists — to accomplish a task (in this case generate a series of images of the same character for a kids book). And when I try to do so it’s so obviously limiting that I feel it must be something reasonably fundamental to how they work. The tech is so clever, but it can’t do this one, simple thing? I’m suspicious.

I mentioned it suggested I try some workaround. What where they? As I said, to generate images you have to give text ‘prompts’ - “Give me an image of a dog up a tree making jelly”. It will then do so, but it might be a bulldog, when you really wanted a labrador. So you can tweak it. “Give me an image of a labrador dog, up a tree, making jelly.” You’ll be closer, but it might be the wrong flavour jelly, or the wrong type of tree and so on. You might think you can fix all this, but simply specifying in precise detail every element:

“It’s winter and a birch tree (12.5 m high) has no leaves. A three-year-old golden retriever with a blue collar is a third of the way up making strawberry jelly, looking at the camera. The image should look like a photograph, and there should be rolling hills and woods behind.”

However, there’s another problem. Actually there’s two. Firstly it’s impossible to be so specific that the tree or a person is going to look like the same tree in subsequent images. Things, like trees, or humans - or in fact most things - are just too complex and varied. They say a picture tells a thousand words, but actually its way more than that. Consider a picture of Jim Carey. You could try to describe what he is in words - a a man of a particular height, age, haircut, shape of his chin etc… But a photograph narrows it down from the infinite possible variations that those words could refer to, to one specific combination — everything that it means to be Jim Carrey. But then there’s a second problem. Even if it were possible to type out in enough detail to nail down a specific human, or tree, or rock, so we could all recognise it, there’s a limit to how long your text prompt can be with these AI. And the limit is about the length of this paragraph.

Photograph capturing a winter scene with the backdrop of rolling hills and a dense forest. In the forefront, there's a tall, leafless birch tree measuring approximately 12.5m. Surprisingly, a golden retriever, aged three and wearing a blue collar, is positioned a third of the way up the tree. The dog is amusingly engaged in the process of making strawberry jelly and is gazing straight into the camera with an endearing expression.

Wait really? The length of that paragraph? I’m supposed to describe all the characters I want to feature in my story, in intricate detail, every time I want to see them, in just one paragraph? Call me a pessimist, but I don’t think that’s going to work.

As I understand it, the length of the prompt will increase over time, but how long will it need to be to specify the exact measurement of Jim Carey? Pretty long.

Now let’s come back to the weird problem. For the image above I asked for a dog up a tree making strawberry jelly, which is admittedly kind of weird in itself (hey, it’s just how my mind works). But it never even occurred to me that the tree should be growing strawberries from the ends of its branches. That’s weird on another level. And in the winter too… It’s also kind of creepy. To me it carries connotations of genetic modification, or just something a bit unnatural, not of this earth. I think any human would see that, because we all live on this earth. An AI doesn’t, it doesn’t live anywhere. It doesn’t even live. So it has no idea if something is weird or looks right, it has to learn it all.

To some extent it has. If you ask for something that the AI has been trained on a lot, and which is quite simple, it does a much better job. Here’s a sofa:

It’s not a real sofa, but it certainly looks like one. I’d sit on it. And you see this pattern with AI images. The more commonplace the thing you ask for, the more images of that sort of thing it’s been trained on, and the better it is at creating them. But they’re less good at unusual things, or unusual combinations. It’s a problem that will be improved over time, as they’re trained on ever more images, and thus presumably higher numbers of unusual things. But for my story I want it to show me a CamperCopter — an invention that hasn’t been invented yet, and which has never been photographed, nor drawn. I don’t see how it’s going to get past the ‘weird’ problem with this, because it’s can’t check its output against a million previous CamperCopters. They don’t exist.

For me the whole point of story telling — or any creative endeavour — is to come up with things that are different. I want to create things. And I suspect the AI is always going to be behind the curve on this, showing me weird things in the pictures I ask for, because I’m asking it for things that it hasn’t seen before.

But enough of this. Lets turn now to another issue that we so far haven’t looked at. The moral angle of all this.

All the images above, and any produced by an AI, are ‘new’. They’re not doing what Google does, i.e. finding images that someone has uploaded to the internet. They’re not even cutting them up, and sticking them back together to make new images, like some sort of super-collage-maker. They’re new images. But they are also trained on millions and millions of actual artworks, created by real artists. Those artists never gave permission for their work to be used, and they didn’t get paid.

It’s not a big surprise that they’re rather upset about this. I was unaware, when I began writing this, just how angry some people are about this. In one example, a man used AI to create a children’s book as a present for his daughter, and then went semi-viral when it received dozens of scathing reviews on Amazon (“disgusting,” “the author should be ashamed.” “Do not buy this book if you like humanity.") Discussion groups for children’s illustrators — who you’d think are probably a pretty mild-mannered group of people generally speaking — are suddenly places of division and anger, where questions about using AI can trigger intense, bitter rows. People have even been stabbed to death with Hb pencils. Of course they haven’t really, but there have been lawsuits launched, and presumably we’ll soon know more on the complex questions of legality.

As I saw the early images in this article, and I realised the AI might be able to illustrate my book, I got quite worried about whether it should. It felt like I was being forced into a difficult decision, where I had to take a difficult moral choice between paying thousands to hire a human illustrator, or do the job just as well, and at no cost, by using an AI?

It may be that such a dilemma is coming, but the more I’ve looked into it, the clearer it seems that — for now at least — AI simply can’t do what a human illustrator can do. They don’t have a memory for characters and objects from one image to the next, and they don’t know that it’s weird to show strawberries growing from the tips of a branches of a tree in winter.

That doesn’t mean AI isn’t useful though. I am still thinking about trying to find an illustrator for the book above (I’m looking into the Kickstarter option, more on this another time). But if I do, I think it will be much easier to explain my vision for how the illustrations should look. That’s not to say an illustrator won’t then come up with something much better, but at least I’m able to show what I think, rather than struggling to describe it. During the process of writing this essay, I got the AI to produce dozens of options for how the CamperCopter could look. Most weren’t much good, but there were a couple I liked, because they did a good job of blending the car into a helicopter. They still had weird bits, and I can’t get the computer to reproduce them a second time, (or put them into another scene). But I can ask a human illustrator to make something that looks ‘a bit like this’, or use this as a starting point — or just to see if we’re on the same page. And we can do that at no cost to me, and no time cost for the illustrator. So that feels helpful. To illustrate, here’s one CamperCopter concept I quite like:

OK. It needs a human to de-weird it. But it’s a place to start. And just maybe using the tool this way will result in a better experience for readers.

Conclusions

This is very much an article of two halves. Part one starts to tell the story of Katie’s CamperCopter, and tries to illustrate it. And part two tells why that doesn’t work. But in a way that rather neatly shows the problem. AI can’t yet illustrate a children’s book, at least nowhere near as well as a human artist. And while only an idiot would make predictions in this field — I am that idiot — And I’ve got a hunch it’s going to be some time before they can. In the meantime I’d still very much like to get this story out there, somehow and I’m investigating other options to do so (more on that soon, I hope.) But for now, maybe the conclusion is that the human race is a little bit safer than we’ve all been led to believe?

We’ll see, and thank you for reading.

Linda Gosslin

Nov 9, 2023

This is a fascinating article. I now understand much more about AI, and what it can and can’t do. Thanks for writing it. I have every book you’ve published,including the first children’s book.

Expand full comment

Robin Mehaffey

What a great article! Thanks for all your research and experience and taking the time to share it. This is a fascinating subject. I was thinking that AI might be a quicker solution than my learning how to paint with watercolors to illustrate a children's book but probably not. And learning to paint is fun, too! P.S. Love all your books. :)

1 more comment...

Reflections from the Rockpool

Discussion about this post