Commodity artificial intelligence-as-a-Service (AI-aaS) offerings are popping up everywhere. Just as you can whip out a credit card and spin up a virtual data center in Amazon, Microsoft, or Google’s cloud, you can now call on previously trained machine learning clusters to handle your AI chores.
Using an API, you can upload a photo library to Google Cloud Vision or Amazon Rekognition to have the program scan it for objects, faces, logos, or terms of service violations in seconds, for fractions of a penny per image. Any business can now deploy the same technology used by the Google Photos app and Amazon Prime Photos to automatically categorize and label smartphone snaps based on the people, objects, and landmarks inside them.
Real estate companies use image recognition to allow prospective home buyers to search for houses whose appearance pleases them. Car companies like Kia use AI to customize marketing campaigns based on the photos people post to social media. Cities can also use the technology to understand traffic patterns and make better decisions about infrastructure projects. And so on.
This all sounds wonderful, revolutionary, and scalable, but as with other commoditized technologies, the off-the-shelf, one-size-fits-all approach doesn’t work for all companies or business goals, which raises the question: For your AI needs, should you choose a commodity cloud AI service or opt for a more comprehensive custom solution? As AI becomes more and more critical to businesses, three basic options have emerged:
- Use a commodity AI-aaS offerings such as Amazon AI (including Rekognition), Clarifai, CloudSight, Google Cloud Vision, IBM Watson, or Microsoft Cognitive Services. These offer a relatively narrow range of AI functions, mostly enabled via APIs for text and image recognition, as well as natural language processing (NLP).
- Engage third-party applied AI companies that specialize in a broader and more customized range of vertical AI services. This sometimes involves an on-premises solution for companies that don’t want to share their data in the cloud, or focuses on a particular vertical, such as finance, health care, marketing, or retail.
- Build out a full-stack machine learning system from scratch, using your own experts and data. This is by far the most complex option and is primarily for organizations where AI is essential to their core value and revenue.
Each of these options makes sense for certain kinds of business users. Exactly which one is your best option depends on how you answer the following questions.
1. What kind of AI jobs do you need to do?
AI is helpful in a wide range of business use cases, including predictive analytics, forecasting, process optimization, personalization, and many others. But while IBM Watson offers some additional analytics and language processing tools, many commodity AI-aaS vendors are focused on the tasks most commonly associated with machine learning: text and image recognition. These serve as out-of-the-box solutions for specific, narrow tasks for organizations that have main functions that do not center around AI — say, a local law enforcement authority that wants to quickly scan image databases against a picture via facial recognition, or editorial sites that want to moderate comment sections (or images) for objectionable content.
If you have any other or more complex AI needs in addition to those clearly defined tasks, or massive amounts of data (proprietary or otherwise), you’ll likely want to engage an applied AI partner, or embark on your own internal full-stack AI setup (more on that later).
2. What kind of volume can you afford?
Image and text recognition services are increasingly commoditized, and sometimes even free at low volumes. But if you’re doing them at scale, the costs can grow exponentially.
Say you’re running a small photo-sharing service and need to scan and analyze 10,000 individual images a month to ensure they don’t contain objectionable content. On Amazon Rekognition that would cost $10; Google Cloud Vision would charge you $13.50, but that also includes label detection (i.e., identifying whether it’s a picture of a cat, a bicycle, a bagel, etc). Label detection would also be useful for, say, realtors who want to flag kitchens featuring particular types of cabinets or countertops, or doctors who need to identify different types of skin lesions.
If you were operating on the scale of Pinterest, however, whose users upload 14 million images a day, the economics of image safety search would change significantly. Even at the steeper discounts offered for large volumes of images, it would cost a service of that size about $16,500 a day — just over $5.1 million a year — with Google Cloud Vision.* Using Amazon would cost $10,600 per day and around $2.3 million annually.
Of course, the cost also goes up depending on how much information you’re asking the AI to provide. At its steepest discount, Google Cloud Vision adds another $0.0006 per image for detecting text, plus the same amount for detecting faces, logos, and landmarks, respectively; add all that to labeling and content scanning, and a service on the scale of Pinterest is looking at spending more than $17.6 million annually.
Suddenly those inexpensive commodity cloud services don’t seem so cheap anymore.
3. How good do the results have to be?
Though commodity AI-aaS machine learning models have been trained against very large data sets — as when Google used 200,000 images from the Metropolitan Museum of Art to train its BigQuery engine — that doesn’t mean they’re always going to produce accurate results.
Upwork recently published a comparison of six leading image recognition APIs to gauge how accurate they are at labeling images of animals, people, text, and objects. The test wasn’t rigorously scientific, but the results were fascinating.
Each AI engine’s predictions were on target with some images and far off base with others. For example, all excelled at identifying a parked car on an urban street, but some stumbled when shown two cats, the Grand Canyon, a bottle of wine, or three people standing on a sidewalk.
Shown a realistic portrait of a Western frontiersman leading his pack-laden horse, Google CV correctly identified it as a painting, while Watson suggested “camel racing” and Microsoft’s best guess was the surreal “person riding a surfboard on top of a book.”
A big advantage to going with an applied AI solutions provider or consultant (or running your own AI stack) is the ability to train the machine learning models in more customized ways and fine-tune the results to increase accuracy. For example, if you’re building a wine recommendations app, instead of just labeling a bottle as “wine” or “pinot noir,” you might want to drill down into more specifics, such as the vintner, region, or vintage. Or if, say, you’re a brewer who wants to automatically identify your beer’s logos on social media images even when they’re only partially showing and the bottles are tipped over — a stiff challenge in the facial and image recognition process known as occlusion — then you would benefit from an applied AI or DIY full-stack solution.
4. How much flexibility do you require?
Commodity AI-aaS offers far less control and flexibility than an applied AI or in-house full-stack solution in other ways, too. For example, Amazon Rekognition offers thousands of image labels, but not always ones your business needs. Amazon might be able to tag “kitchen” or “sink,” for example, but not necessarily “Kohl faucet” or “tile backsplash”. To add new labels or change how Amazon flags images for potentially objectionable content, you’ll need to request it. Amazon requires six to eight weeks to add new types of moderated content and does not promise to honor all requests.
Google Cloud Vision places limits on the size and number of images you can feed through the API at any time, and all services limit the kinds of files they will accept and types of data they can recognize. Amazon accepts only PNG and JPEG files, for example. Only three of the six AI-aaS vendors mentioned here offer optical character recognition (OCR) along with image recognition; only Clarifai accepts video as well as still images. In other words, if all your real- estate images are in RAW format, you may need to convert them first. If you want a service that reads the labels on images of wine bottles, you’ll want OCR.
The old Henry Ford line about how you can have a Model T in any color (as long as it’s black) applies to AI as a service — your options will be limited.
5. What kind of performance do you need?
Latency is the quiet killer for applications that require near-real-time image or text processing. Clarifai notes that its API responds within 200 to 400 milliseconds for a single image sent from inside the United States; add more images or video, or increase your distance, and the latency grows worse. CloudSight, on the other hand, needs from 6 to 12 seconds to respond, possibly because it relies on human crowdsourcing to manually tag some images.
As with all cloud services, reliability is also an issue; your ability to process text or images is entirely dependent on the availability of third-party servers. Anyone who’s suffered through the rare AWS or Google outage can tell you how frustrating that can be. Even one extended outage is one too many.
Having an AI stack on-site will largely negate the latency issue and give you more control over availability.
6. How much in-house expertise do you have?
AI engineers are in huge demand. Many organizations simply don’t have the necessary talent on hand, and recruiting that talent means competing for candidates with companies such as Google, Microsoft, Facebook, and Amazon, which are aggressively investing and innovating in the AI arena. And even if you do have the resources to hire top AI engineers, you’ll still have trouble finding ones who have domain expertise around your particular business.
If you’re just experimenting with incorporating AI into your business, or you want to offer basic low-volume AI functionality as a service to customers, then cloud-based services can be a good way to get started. But if you need more scale, greater flexibility, domain expertise, data privacy, or services that a commodity cloud service doesn’t offer, and you don’t have the desire or resources to recruit and hire a full AI team in-house, then finding a third-party applied AI provider is probably a better way to go.
While ramping up will be a business and technological challenge, creating your own full AI stack can be significantly advantageous for your organization in the long run, if AI is your core value. But for everyone else, getting on board with a AI-aaS solution or applied AI partner is essential. As noted by Harvard Business Review, AI is poised to be a transformational technology — on a par with the steam engine, electricity, and the internet. Organizations that don’t get ahead of that train are in danger of being run down.
*Google Cloud Vision’s pricing only accounts for volumes up to 20 million images per month. Presumably, there are discounts for higher volumes available upon request, but even then, the expense is considerable.
Ken Weiner is CTO at GumGum, an applied computer vision company.