https://generalrobots.substack.com/p/dimension-hopper-part-1

[https]

General Robots

SubscribeSign in
Share this post
[https]

Dimension Hopper Part 1

generalrobots.substack.com
Copy link
Facebook
Email
Notes
Other

Dimension Hopper Part 1

2D Platformer using Stable Diffusion for live level art creation

[https]
Benjie Holson
Jun 12, 2023
8
 
 
Share
Share this post
[https]

Dimension Hopper Part 1

generalrobots.substack.com
Copy link
Facebook
Email
Notes
Other

The next few posts are going to depart a bit from the regular theme
(which has been lessons learned in a career doing general purpose
robotics). Instead I've decided to use some of my new free time
1
to learn by doing and play with some of the cool new ML all the cool
kids are talking about. 

The Project:

My project is to make a 2D platformer where the players can design
their own levels and then generative AI will create beautiful
rendered images to represent the levels. I wanted to do something
that wouldn't be possible without AI: letting the players participate
in the creation of art. We'll skip to the end and you can see what
the game looks like now:

Loading video

And here are some of the different themes, though you can also create
your own.

Loading video

You can play with it here: dimensionhopper.com

I recommend looking at random levels or the gallery and seeing what's
out there.

The Journey

But lets talk about the process to get there.

Part 1: proof of concept

I'd played a little bit with Stable Diffusion before and it is a
really fun toy to make cool pictures, but I always felt like I didn't
have quite enough control. For me the enjoyment of creation comes
from the interaction between what I do and what I get, and I wanted
to have more input. That's why I was so excited when I read about
control-net, which gives a ton more knobs to control the output. I
immediately wanted to use it to make a 2D game.
2

Getting up and going

I installed Stable-Diffusion on my laptop, fired up the webui
3
got control net working and fed this depth image into Stable
Diffusion:

 
[https]

This has the platforms as the closest pixels (white) and the black
background means that part is far away. I was using a pixel-art model
that had this amazing demo pic in the CIVAI page:

 
[https]

So I copy the prompt and settings from that, tweaked a bit and hit
generate........ And get this:

 
[https]

Not quite what I was hoping for... I try and fail to get the depth mode
working for a while.

 
[https]

And switch over to "scribble" mode for control-net. Scribble mode
takes outlines of shapes and lets them guide the images (instead of
depth).

 
[https]

More interesting but still not good. 

Changing the prompt: 

"pixelart video game environment, platformer level. Create an image
of a mystic stone temple in the jungle. shafts of sunlight, dramatic
lighting. Vines and cracks in brown stone"

Much closer! I have a picture. It kind of interacted with the level.
Still not good but better. 

 
[https]

Ooh, I have the level casting some shadows now.

 
[https]

"pixelart video game environment, Create an image of an abandoned
space station, with broken systems, flickering lights, and a sense of
danger. Show the wreckage, the abandoned rooms, and the unknown
threats that linger."

Eventually with some more playing around with settings I get levels
like this pretty reliably:

 
[https]

This is a big improvement over the start, but (1) it didn't really
look like the level was part of the art, it was, at best, pasted
over, and (2) the level textures looked like a repeating tileset.
Human videogame developers do this so that they don't have to draw a
different bit of grass for every square of platform, but I didn't
have that constraint. Huh.

No more pixelart.

I decided that part of my problem was using a model trained on
pixelart. It was faithfully copying the genre of repeated tileset,
which was exactly what I didn't want. So I changed to another model,
this one built around children's illustration, and my first image out
looked like this:

 
[https]

Wow! So much better! The platforms have shadows on them, have objects
in front and behind and it's actually a nice picture. I'm on to
something!

The new model made nice pictures but I quickly realize I'm walking
the line between two failure modes. 

Either I have a nice picture with the level kind of mostly drawn on
top of it:

 
[https]

Or I get a nice, integrated picture where its really unclear where
you are allowed to stand:

 
[https]
 
[https]

The last one is especially problematic because the 'level' part you
can stand on has been rendered as a window, exactly inverting the
semantics. This is a fundamental problem with using 'scribble-mode'
as I'm giving Stable Diffusion no way to know what is close and what
is far, just outlined shapes. 

Breakthrough: Lips on the Platforms.

I go back to depth, but have a thought: what if I hint that the top
of the platform has a little lip. Rather than code it in my game
rendering engine I just draw them in Gimp.

 
[https]
 
[https]

Happy little platform toppers

Holy shit it works!

 
[https]

And more importantly: it works pretty much every time. Most
Stable-Diffusion workflows include generating 4-10x more images than
you actually want, and choosing the good one. For my idea to work we
needed all of the levels to be playable (you can tell where the
platforms are) and most of them to be good (beautiful illustration)
because there wouldn't be a human curation step.

Control Image Matters

I've learned that the look of the depth image really changes the
quality of the output. I was working on jungle ruins and kept getting
images like this:

 
[https]

(Side note: its amazing how quickly standards rise. Early on that
image blew me away, but now it looks meh). 

The problem
4
is that there aren't really any reasonable pictures of jungles with
that depth map. Jungles (and really everything) doesn't look like
that. Stuff doesn't float in the air, it has stuff under it holding
it up. This leads to breakthrough 2: add supports.

 
[https]

Each platform block projects a dark gray box to the floor below it
and that gives structure to the world. The dark gray has no gameplay
purpose, it just acts as a hint to Stable Diffusion about what the
picture is of. And we get much better images.

 
[https]
 
[https]

Things were pretty good, but I was still having trouble with caves,
and I wondered if it was because it was trying to match the straight,
sharp edges of the depth map and having a hard time making it look
organic, so I added adjustable roughness to the images (as well as
adjustable background depth so that sky can be far away for outdoor
scenes and closer for indoor/jungle scenes.

 
[https]
 
[https]

Notice the bottoms of the platforms are really straight and its
invented a bunch of lightcolumns / waterfalls that can have perfectly
straight verticals. This one is actually quite good, but the square
corners are not ideal.

 
[https]

Now everything is believably subterranean, and the underlighting on
the bumpy rocks looks right.

Gems and Characters

 
[https]

Stable diffusion doesn't create any transparency.

 
[https]

But can cheat with the level image because I have the depth
information I fed in and mask based on that, so characters can go
'behind' the platforms when needed. For the gems, if I ask for '<blue
gem/ruby jewel> floating videogame object on a black background' and
then subtract the background in python.
5

 
[https]

For characters I found this model which was really fine-tuned to heck
to make 4 frame walk animations. The creator really wanted left/right
/up/down animations so combines this model with a lora of whoever the
subject is to create all 4 directions depicting the same character.
As a result, I think it is trained so all the 'walk right' just get
the prompt "PixelartRSS".

 
[https]

I'd like to be able to prompt something about the character I want,
and this sometimes kinda works
6
but not nearly as reliably as I'd like. My suspicion is that if I had
the training data, limited it to just the sideways walk and labeled
it with actual descriptions of the characters it would work better
for me. But it does reliably make people that walk. So as long as you
aren't picky, you can get new player character sprites all day long.

Wrapping it all up

From there I just had to make it work in my own app. I used the
excellent diffusers library to wrap the generation and make my little
server. Everything is moving so fast that I'm sure I've done some
things in very silly ways in my image_generation.py and there are
probably 2-3x speedups to be gained be configuring it right, but for
now it all works, and it's fun to play with. Why are you still
reading? Go check it out!

dimensionhopper.com

And subscribe for more about the development of this game, and more
robotics content after that.

Discuss on Hacker News: https://news.ycombinator.com/item?id=36295227

Share

[                    ]Subscribe
1

Who needs a job when you have hobbies?

2

I'm sure a bunch of other folks have had the same idea but as far as
my lazy googling can turn up, no one is doing dynamic level rendering
during gameplay. Maybe after this blog-post there will be more ;).

3

Not pictured: several frustrating hours of conda, pip, configuration,
cuda, torch, nvidia-drivers, etc etc. Why is the world still like
this?

4

I think, though who knows what's going on in the mind of the black
box

5

The backgrounds end up being mostly, but not exactly an even color,
so I use the average of the border pixels and subtract everything
within some threshold of that.

6

I assume some of stable-diffusion's semantic understanding leaks
through the super strict fine-tuning.

8
 
 
Share
Share this post
[https]

Dimension Hopper Part 1

generalrobots.substack.com
Copy link
Facebook
Email
Notes
Other
Comments
[https]
[                    ]
Top
New
Community

No posts

Ready for more?

[                    ]Subscribe
(c) 2023 Benjie Holson
Privacy [?] Terms [?] Collection notice
 Start WritingGet the app
Substack is the home for great writing
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts