Midjourney founder says 'the world needs more imagination'

In April 2022, OpenAI — the artificial intelligence (AI) company cofounded by Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman, Wojciech Zaremba and John Schulman — debuted DALL-E 2, an AI tool that can create realistic images and art from a description in natural language, like "teddy bears working on new AI research on the moon in the 1980s," for instance.

In an attempt to take a step toward artificial general intelligence (AGI) by rendering it with the sense of sight, OpenAI created an internet sensation. In the company's words, “DALL-E 2 will empower people to express themselves creatively.”

Think of anything as random as “Cookie Monster reacting to his cookie stocks tanking” or “astronaut riding a horse in the style of Andy Warhol” and DALL-E 2 could generate it.

The technology was even recently used to make the first magazine cover generated by AI.

Although OpenAI just expanded early access to the tool, it inspired the creation of many similar image generator tools, including Google’s Imagen, Meta's Make-A-Scene, TikTok's AI green screen and the fun-yet-horrifying DALL-E mini by Boris Dayma.

As these tech giants battled for AI art supremacy, The Economist featured a new entrant to the game – Midjourney – on its June 2022 issue cover.

David Holz’s version of this technology, known as Midjourney, quickly rose to prominence, and everyone who got their hands on this text-to-image generator was thoroughly impressed. Most recently, the Colorado State Fair’s annual art competition awarded its blue ribbon to emerging digital artists to Jason M. Allen, who had used Midjourney to create an artwork called “Théâtre D’opéra Spatial.”

Midjourney's journey

To understand Midjourney, it's important to look back to 2011 — the year David Holz launched his first AI-based startup, Leap Motion.

“In many ways, I wasn't interested in artificial intelligence (AI) because I did not care much for making machines better,” Holz told VentureBeat. "Coming from the IA [intelligence augmentation] school of thought, I've always been more interested in empowering people and trying to make people better.”

Like many experts in AI who believe in using machines to perform tasks that humans would consider intelligent or smart —and experts in IA, who place humans at the center of the system and use technology to support and complement human cognitive functions, Holz chose a path that would let him enjoy the best of both worlds.

“Over the years, I’ve realized that we can use AI to empower people and to make people better and those people can make better AI — it’s like coming full circle and everyone wins,” he said.

Leap Motion transpired out of this ideology. The company developed an optical hand tracking module that captures the movements of human hands using AI. “The goal wasn't to replace a sign language person, but it was to allow us to literally be embodied in virtual spaces inside of computers. And now, with Midjourney, we are not trying to replace an artist but are giving them tools to explore new mediums of thought and expand their imaginative powers,” Holz explained.

In 2021, Holz started Midjourney as an independent research lab. Around the same time, industry buzzwords like 'diffusion models' and 'contrastive language-image pre-training (CLIP)' were on everyone's lips.

Building on these developments, the lab began offering its text-to-image service in 2022. Similar to its counterparts, the AI system accepts a design prompt or idea in the form of a phrase and uses it as inspiration to create captivating images. Midjourney stands out because the AI bot can only be accessed via the voice over Internet protocol, instant messaging social platform, Discord — rather than via its own website or mobile app.

When a natural language query is issued, the bot responds with four low-resolution images in about 60 seconds. Users can generate variants and new generations at this point to get closer to their desired ideation. Users can change the aspect ratio of the prompt with a maximum resolution of 2048×1280 pixels, much higher than DALL-E 2’s 1024×1024 resolution.

Close-up photographs of discrete objects, pop culture references, charcoal or pencil sketches, paintings in the styles of various renowned artists — Midjourney can do it all. It’s exceptional at creating larger-than-life scenes.

As to the competition, Holz said, “I don't really want to spend too much time comparing ourselves to others. I like to hope that the results speak for themselves. Kind of like how Apple doesn't spend all their time talking about how Android sucks.”

Midjourney on ethical issues

Given the grand scale on which Midjourney performs, artists and researchers alike have begun expressing concerns about this technology’s collateral damage. Of the many questions raised, three garnered much attention:

Holz addressed the three, extensively, below:

1. Could Midjourney replace human designers?

No, it cannot. As per Holz, Midjourney is meant to augment our capabilities, not replace us by any means.

“It’s kind of like the moment humans invented cars. Just because cars can go faster than humans, doesn't mean we cut our legs off. You are going to use cars to get someplace faster. It’s basically augmenting our speed," he said. "Similarly, our product involves an iterative, beautiful explorative process, where it becomes an extension of your imagination. And you can wander, explore and figure out what you want on the fly. That's a positive thing.”

2. Does it plagiarize or violate content policies?

This is a particularly interesting and controversial question, as Midjourney pulls its training data from the internet. However, Holz claims that the AI engine is designed to only “take inspiration” from the data and ensure that the output is entirely novel, that is, unlike any image that’s publicly available. Oddly enough, Holz claims to have received multiple requests from artists to double down on Midjourney’s ability to take inspiration from their own work as well as others.

“The number one request from artists is to make Midjourney better at copying, to which I don't fully know how to respond yet. They're like, David, ‘let me put all of my art into the system. I want to copy it as well as possible so that it can be part of my artistic flow,’" he explained. "They think that the better they can get at copying their personal art style, the more useful it is. Whereas if it has its own style, they have to kind of meet it halfway and pull their stuff out of it. Which is interesting. It's a little scary for me because I see how it could be used for good and evil.”

3. Will it produce results that demonstrate gender biases, reinforce racial stereotypes — or contain anything explicit?

As Midjourney is intended to be open by default, it has strict policies on ensuring that content is PG-13. It automatically blocks text inputs that are inherently disrespectful, aggressive, abusive or sexual, Holz confirmed Most importantly, the rules are enforced for all content, including interactions in private mode.

For all-things artsy, not business

Midjourney currently offers a limited “freemium” model that allows users to submit 20-25 prompts for image generation. After that, users can choose from a range of subscription packages — ranging from a basic membership package of 200 images, a standard membership, which includes unlimited images — or a premium corporate membership, which includes both unlimited images and complete privacy.

It’s important to note that “corporate membership” does not refer to an enterprise software-as-a-service (SaaS) product. In fact, Holz explicitly mentioned that the company has no interest in building one either, even though they have many customers who use the product to make commercial video games, concept art and videos.

“Our technology is moving so fast that it makes sense to focus on the consumer side because that's where people can just take things and run. Also, there's something very simple and beautiful about making a cool thing," Holz said. "It only gets better when regular people can pay and have fun with it while professionals pay less than they would for an enterprise product, and still enjoy the product and use it for work. I think this simplicity is worth a lot and we want to keep it.”

What lies ahead: Text-to-3D?

While the world believes that the next phase of text-to-image evolution will move towardfull-blown videos or movies, Midjourney begs to differ. In fact, the company might avoid that as much as possible — as incorporating text-to-video capabilities could make the product more expensive and the output could be a dealbreaker, if it’s not thoroughly thought through.

That said, Holz does plan to take things to the next level via text-to-3D. He detailed Midjourney’s quest to make the output more real and move towards augmented and virtual reality. It aspires to bring the liquid imagination to the real world.

“I care about three things: Reflection, coordination and imagination. To make a better world, we need to be more reflective, more imaginative and we need to be better at coordinating. And I want to build something big in each area, and then bring them together one day,” he said.

That aside, the company does intend to build out the existing product with more enhanced features, thereby making the output more realistic and nuanced.

In addition, Midjourney's technology uses a combination of its own models and open-source codes to create art. Holz's near-term goal is to stop using open-source products and create the codes 100% in-house.

“I feel like there are people in technology who basically act like we have no past, and there's a lot of people in the world in fear of not having a future. But I feel like the truth is we're actually very much mid-journey," Holz said. "We have this beautiful and rich history behind us, and an equally rich wonderful future ahead of us,” Holz concluded on an optimistic note, hinting at AI’s promise of limitless possibilities and the company’s ethos."