Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
Get ready, developers: Today, OpenAI released the hotly anticipated DALL-E API in public beta, which means developers can now integrate DALL-E directly into their apps and products.
With the announcement, DALL-E, a transformer language model that allows users to use natural language prompts to create and edit original images, joins GPT-3, Embeddings and Codex in Open AI’s API platform.
Companies such as Cala, a fashion design platform, and Mixtiles, which prints online photos on lightweight decorative tiles, have already implemented and tested the API for their specific use cases.
Meanwhile, Microsoft is bringing DALL-E to its new graphic design app, Designer — and is also integrating DALL-E into Bing and Microsoft Edge with Image Creator, allowing users to create images if web results don’t return what they’re looking for. Stock imagery provider Shutterstock also announced last week that it would use the API to offer DALL-E-generated images to customers.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
OpenAI will continue to iterate DALL-E API
The API will be available for anybody to use on the OpenAI platform, Luke Miller, product manager at OpenAI, told VentureBeat.
With the API in beta, “we’ll continue to iterate and improve through the end of the year,” he said. “We’re really excited for all the ways that developers can take this technology and customize it for specific needs, specific applications and specific communities, to scale further than we ever could.”
Miller added that the company has taken many of the lessons learned over the past months of deploying the DALL-E beta to millions of users and built it into the API, “so we can feel comfortable sharing this out with the world, but also let developers focus on the fun stuff of building.”
DALL-E’s fast-paced journey to cultural touchstone
The DALL-E API is yet another big move for the text-to-image generator, which since DALL-E 2 was released just six months ago has become part of the mainstream pop culture zeitgeist — including millions of views of art generated by DALL-E on social media, a segment about DALL-E on The Today Show, and a recent appearance by CTO Mira Murati on The Daily Show.
At the same time, there have been plenty of outcries and fierce debates over issues including the prospect of legal wrangling over copyright ownership of DALL-E images; how DALL-E may reflect bias in its training data; and questions about DALL-E’s accuracy and ability.
But Open AI claims that 3 million people are already using DALL-E to spur creativity and speed up workflows, generating over 4 million images a day. Developers, they say, can now start building with DALL-E in minutes.
From side projects to startups
That includes making it as easy as possible to get up and running by signing up, getting an API key and starting to build, Miller explained.
“Whether that’s somebody who’s just hacking on a fun side project over the weekend, whether it’s an early-stage startup, an artist working on a creative project, or a large enterprise, all those people are able to come in and use this technology integrated into their product,” he said, echoing what many are predicting — that the DALL-E API debut will open the floodgates of generative AI startups.
“The fun hacking side project will eventually become a startup in some cases,” he said. “Ultimately, if you’re excited to build with this technology, we want you to be able to do it and build it into your product.”
Rowan Curran, AI and ML analyst at Forrester Research, believes that if the DALL-E API does allow image editing and refinement, it will be “tremendously useful” for developers.
“Then you can actually embed it as a full application into whatever enterprise use case you want,” he told VentureBeat.
API price will be per image
The DALL-E API is priced per image output, based on size. 1024 x 1024 costs $0.02/Image, while there are very slight discounts for 512 x 512 at $0.018/Image and 256 x 256 at $0.016/Image.
The API has three capabilities, Miller explained. Users can generate an image, edit a part of the image, and also generate multiple variations of the image.
“You can think of it as not unlike the creative process, coming up with ideas, picking something and narrowing in and then continuing to iterate and find something that suits your need and the given context,” he said.
Historically, Curran pointed out that one of the limiting factors around large language models overall is the cost involved in running them. So if the price is right on the DALL-E API, he said it would “open up a whole set of use cases, especially for startups and folks who are getting seed funding.”
That said, he added that large enterprises, especially innovation teams, will likely want to use the DALL-E API as well.
“In addition to that, I expect to see that drive more enterprise-level research and usage and in terms of adopting and fine-tuning their own large language models for various use cases,” he said. “Because I think that ability to take the large language models, add this fine-tuning layer on top for some of these really specific industries is where it’s gonna really start to be very game-changing.”
Questions about trust and safety
Critics continue to question issues related to the trust and safety of generative AI generally, and DALL-E in particular — that fake photos could be used to bully and harass, for example, or spread disinformation and spur violence. In May, researchers said the tool could also reinforce stereotypes against women and people of color.
Those with ethical and legal questions around DALL-E may not be thrilled with the news that images generated with the API will not require a watermark — which was implemented during the DALL-E 2 beta but is optional with the API.
But in a press release, OpenAI maintained that the DALL-E API is “incorporating the trust and safety lessons we’ve learned while deploying DALL-E to 3 million artists and users worldwide.”
With the API, “developers can ship with confidence knowing that built-in mitigations – like filters for hate symbols and gore – will handle the challenging aspects of moderation,” the press release continued. “As a part of OpenAI’s commitment to responsible deployment, we will continue to make trust and safety a top priority so that developers can focus on building.”
Mixtiles uses DALL-E API to make memories
Eytan Levit, co-founder of Tel Aviv-based Mixtiles, says the company immediately saw the potential of DALL-E 2 and signed up for early access.
“We started playing with DALL-E 2 to create framed pictures of childhood memories, ‘spirit animals’ and dreams that our family members and friends described to us,” he told VentureBeat. “We wanted to see if they would hang these pictures on their walls, and they did.”
Levit pointed out that there is a learning curve for the first-time DALL-E user. “For example, you need to know which styles you can use, such as an oil painting, digital art, pencil sketch or watercolor,” he said. “We’ve learned that referencing time of day materially affects your results, while color palettes also help with getting great pictures.”
Using the API, Mixtiles’ approach has been to guide the user through a series of steps, each step getting them closer to creating artwork that they emotionally resonate with.
“We think simplicity is key to unlocking this amazing technology to hundreds of millions of people who could use it to decorate their homes,” Levit said.
Ultimately, he added, Mixtiles is betting that generative AI and DALL-E represent a new technological leap, “equivalent to the invention of paper, the picture frame, canvas print or the invention of computer graphics — we think it’s going to fuel an explosion of new use cases, of human creativity and of emotional connection.”
For Mixtiles, that means allowing customers to upload family pictures and portraits and then customize these images.
“Imagine turning a picture of your child into their favorite superhero, or turning your family portrait into a portrait of Simpsons-style characters, or a Van Gogh-style painting,” Levit said. “we’re optimistic generative AI will become an integral part of our value proposition in the near future.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.