I'm a big fan of the “Trade Marks & Symbols” series by Yasaburo Kuwayama.
It's largely a collection of black and white logos.
I think it would make for a good image set. Each of the volumes include 1000+ high quality symbols with references.
Pretty sweet.
And what other way should you test a set then with whatever the current buzz word is. Maybe I'll throw it into a GAN network and see what happens.
Data Extraction and Cleaning
To start though we do have to figure out how to actually get uniform images to work with.
Here's an example page spread:
There's a couple clear things to note:
- There's an apparent uniform grid used across pages. Three columns by four rows. Even when a larger logo is on the page, it seems to take up a 2x2 section.
- Each page has the an even amount of whitespace below the logo grid.
- There is variance in the logo alignment. In centering, position of the logos reference number, and sometimes there is even a little description or title.
- The scans orientation aren't perfect. Whatever method I use has to be flexible enough to handle slight degrees of rotation.
We can use some pretty simple imagemagick
commands to harvest some logos:
Split into Pages
First cut off 616px from the bottom:
mogrify -gravity South -chop 0x616 'page-*'
Split the spread into two halves:
convert 'page-*' -crop 50%x100% 'half-%d.png'
Remove 120 px from the right side:
mogrify -gravity East -chop 120x0 'half-*'
Cut tiles
Cut into 3x4 tiles:
convert 'half-*' -crop 3x4@ 'tile-%d.png'
Remove 84px from the left of each tile:
mogrify -resize '616x616>' -gravity West -extent 616x616 'tile-*'
Fuzzy Crop
convert 'tile-*' \
-fuzz 25% \
-trim +repage 'trim-%d.png'
convert 'trim-*' \
-gravity center \
-background white \
-extent 616x616 \
-monitor 'symbol-%d.png'
Generally this approach worked well enough over manually cropping painstakingly. After a quick review it looks like I had a ~90% success rate.
Cool.
The success rate is slightly smaller if you consider loss of reference number or title a failure. It can be a casualty of the fuzzy cropping or manual clean up.
And as for results, here's some examples:
go to /symbols if you want to check out the whole set.
Generative Adversarial Networks
It went as well as expected. Here's a run over many epochs:
There was some cool artifacting, but to actually get decent results I'd have to have a much larger set and do some more thorough tuning.
I liked this one though:
Making a simple GAN via Keras is cool and all, but a little trite really. That novelty is long gone.
Don't get me wrong there is a ton of bleeding edge interesting work with AI in an arts and visual capacity, but not at my level of NN understanding.
Pretty sure decoding as an SVG or using categorical data with StyleGAN are within my capabilities, but I'd have to come up with a better set first.
Stable Diffusion
The previous work was done quite some time ago, lot's has changed.
Large generic image generation has improved significantly and has overtaken bespoke models trained on specific image domains.
Of these I was recently playing with Stable Diffusion (or SD).
SD can produce some incredible stuff and it can even be quite clever.
For example when I was attempting to generate landscapes, I recieved two very different of the prompt “The edge of our world has a view of our starry night sky…”
However I was mostly already aware of the potency these models have when generating highly detailed scenes.
Back to symbols.
After some finangaling and research I've gotten some satisfactory results with the following:
"minimal geometric logo by karl gerstner monochrome centered symetrical bordered"
-C 15
-s 150
-S 1972652511
-i 9
The largest takeaways being:
- Using an artist is always easier, but limiting
- Having a high “CFG” leads to sharper edges
- Abusing seeds is required for tuning
Appending animal names to the previous prompt led to some great results:
These do come with a touch of processing to generate the .svgs
and clean up the border.
Ongoing Work
Much of the past year has really solidifed an uneasy feeling I've had about the direction of AI.
Put simply, the actual generation of these images doesn't function in a way that is intuitive to how people would expect.
It's more likely my mom would imagine the model working as a search engine, simply searching the space of concepts and matching the words you've typed.
Which is obviously incorrect, but I think it comes down to the current tooling.
People don't think of images as a combination of labels with given weights for each word. Despite this, that's largely all we've got currently to steer these things.
The only other option is referencing other images in some fashion. Either to infill/outfill or do some process of image -> image.
Really though, images are representations of models in our head. Each model representing many many concepts at given values. Being able to describe an image in your head is much more likely to be vague ideas then a poor text description of it.
So I'd like to hope that, while text being so easily exposed as data for us to use as replacement for interacting with the very complex neural networks that will define our lives, we will in the future find more intuitive ways to interface with the representations these networks have internally.
Something like MIMI that's capable of measuring the level of intuitivness of arbitrary interfaces seems like the right idea, even if at a loss of “efficiency”.
Given how many resources are being put to power these things, I'd sure hope we'd actually figure out how to communicate properly.
Anyways here's a generated wool ferret:
That's better.