Presented by CAST.AI

Many companies are accelerating their cloud plans right now, and most say that their cloud usage will exceed prior estimates due to the new demands posed by the global pandemic. Cloud computing is becoming a must-have resource, especially for young tech companies. And most of them are migrating to Amazon, Google, or Azure, lured by seemingly attractive offers.

What many companies don’t realize is how dramatically the cloud spend can increase given that those expenses aren’t charged up-front. Organizations are often unaware of how easy it is to become locked into service at hard-to-understand prices, says Laurent Gil, co-founder and Chief Product Officer at CAST AI.

“Vendor lock-in starts whenever you start using a service in a way that serves the purpose of the cloud provider,” he explains. “You have to choose cloud providers carefully and understand that the decision you make today will impact your operation for at least a few years because leaving this service is going to be very hard.”

The biggest challenge in managing cloud costs

Complexity is a real challenge for startups trying to make DevOps work in the cloud. But what they face is complexity by design on the part of cloud service providers, Gil says, simply because making it easy isn’t in the interest of a cloud provider.

“How do you manage your cloud infrastructure with simple tools that will tell you  what’s happening at a glance and whether you’re doing a good job managing it when your cloud bill is 80 pages long?” he asks. “ By design, cloud bills only tell you how much you spent, not why you’re spending that much.”

It’s an urgent question to tackle, particularly for small companies that can now use a few tools that allow you to understand exactly where the money goes, why you spent that much, and why your bill increases every month. And the more you pay for the cloud as the company grows, the more complex and difficult it becomes for humans to make decisions about cost optimization.

“You often don’t realize costs are mounting in the beginning, and then a year or two later, you’re confronted with a technical, financial, or operational debt,” Gil says. “It’s almost as if you inherit this situation. You don’t notice it in the beginning, but it catches up with you in a few months or years.”

To understand cloud costs, you have to go much deeper than the simple ratio of number of customers to the amount of spend. Do you need all these virtual machines or services? Can you use a service from another cloud provider? Will it run cheaper or with less compute in a different cloud? Is there a performance-cost tradeoff — and if so, where it is?

None of these questions are easy to answer unless you use some form of automation. And that’s traditionally been difficult — despite the fact that CPUs, memory, and storage are so readily available everywhere and should be extremely commoditized, Gil adds.

“Tackling these dangerously high cloud bills requires automation,” he says. “Machine learning is capable of rightsizing: adding, deleting, and moving machines on the fly, automatically.”

The role of AI in cloud management

Machine learning is a crucial component in cloud cost optimization because of its ability to recognize and act on patterns. For example, if a SaaS provider experiences a lot of human-based traffic over the course of 24 hours, an AI engine will recognize the pattern to requisition and automatically add machines during busier parts of the day and delete those machines when they’re no longer needed.

An airline may run a rare deep-discount promotion, and millions of people rush online to buy tickets in a wave so large that it looks like a DDoS attack. But since the AI uses a split-second decision-making process, it only needs a moment to recognize a swift and large acceleration in traffic and provision immediately, making the decision to add a virtual machine far faster than a human could have handled it, any time of the day.

“This is where machine learning works great,” explains Gil. “It can make these decisions based on independent business elements that determine how busy an application is.”

The AI engine will always check whether the machines are the right type and use the amount of compute you need. From a DevOps perspective, if you’re using 100 computers that are being used 80 or 90 percent of the time, you’re doing a great job. But an AI can calculate more precisely and check whether you need 100 8-core machines or 50 16-core machines, an ARM processor instead of an Intel processor.

“The AI engine is trained to not make any assumptions, but optimize using any means that it has learned,” Gil says. “If the image of this application is compiled for both Intel and ARM, the AI engine can slash your costs by half just by choosing the right machine at a given time.”

Another example is using spot instances; highly discounted VMs that almost all hyperscale cloud providers offer. The discount is usually between 60 and 80 percent, but the tradeoff is that you only get a short warning when the cloud provider takes those machines back. This is impossible to handle for a human — but an AI can quickly spin up another machine and look for any other available spot instances.

The good thing about using AI in cloud automation is that it can make decisions based on somewhat correlated variables with a limited amount of information.

“It’s a bit of a black box in the end, but as humans we see its results clearly,” Gil says. “We’re can easily judge whether our AI engine is doing a good job based on how much money we save or how much we optimize.”

Cutting customer costs in half

“AI and ML are great tools for reducing the complexity in managing a complex infrastructure for our customers,” Gil says. “If you replace something complex with something else that is also complex, you haven’t done your job.”

A recent CAST AI client, an online grocery store, started using the company’s new product that optimizes EKS applications from Amazon. The forecast report indicated that they could save 50 percent of their time by moving from one type of machine to another.

“Just by doing this, the client reduced their bill from $180,000/month worth of compute to $70,000 after one week, without affecting the performance at all,” Gil says.

“And it’s a good thing for the cloud providers too — whenever you commoditize a resource, customers use more rather than less of it,” he adds. “We’re ensuring that compute capacity is used the right way, democratizing it, and helping companies funnel those costs back into bigger and better projects.”

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact