Presented by The Sequence


If you’re reading this, you’ve probably seen a fair share of AI/ML landscapes in the past. In most cases, these are put together by analysts, think tanks, VC firms and independent researchers. We at TheSequence are a tight-knit community of over 144,000 data scientists, ML engineers and AI enthusiasts alike. So, in wanting to find out first-hand what the ML crowd really thinks, we decided to do our part and extend an invitation to all our members to shape their own version of an ideal landscape for the entire applied ML lifecycle.

The lifecycle was to include every stage — from data collection and annotation to model training, deployment and monitoring. Naturally, we thought it was an ambitious undertaking, but we equally understood the project’s potential benefits, i.e., creating a comprehensive map that would showcase market solutions for the end user.

How we did it

The value chain we used

We started as one would with any other ML project, i.e., by collecting relevant data, which in this case meant analyzing volumes upon volumes of existing research: from CB Insights, MAD, Gartner among numerous other sources. We dug into a multitude of reports, read different case studies, looked at dozens of media listings, and examined many rankings created by commercial and non-commercial entities.

Once we thought we’d obtained all that we could use — here’s the draft version of the landscape — we asked our community members to have a careful look, gauge all of it and remake it as they’d prefer to see the value chain in practice. Every member was allowed to add and remove specific solutions, question and redo every block, offer their own ideas and alternate pathways and use both their experience and expertise to improve the landscape as they saw fit. The whole project took the form of a survey, during which we asked each participant questions to get a clear understanding of their specific ML processes and needs.

After many hours of work, we managed to get our hands on something truly exciting — the first ever full ML value chain landscape created by none other than the ML practitioners themselves. Here’s what it looks like:

The ML Value Chain Landscape courtesy of TheSequence

Our top 5 ML value chain insights

#1

Our research indicates that roughly half of all individuals involved in ML struggle with data processing and model monitoring, with the figures standing at 48% and 44% respectively. These struggles can be observed throughout the preparation and production stages, as well as during and post-deployment, namely as the data is being collected and labeled and then as the engineer proceeds to fine-tune ML pipelines and do maintenance work on the AI product.

#2

Model monitoring in particular seems to be the least popular stage among ML practitioners. The problem seems to take its roots in the development phase starting with poor optimization techniques and continue analogously after deployment. As one respondent put it, “Surprisingly, most solutions simply aren’t optimized at all for monitoring models during development.” This is one of the reasons many working with ML tend to be wary (as well as weary) of this stage, i.e., when your problems pile up and multiply, often irreversibly.

Additionally, this stage involves a lot of manual labor — project records are often messy because they’re kept by different people in different ways. As one respondent explained, “Monitoring done right is significantly more work than folks claim, especially being able to sift through logs to find intelligent insights quickly.” The common trend here is as follows: the less effort is put into cleaning and tagging the data and the less transparent the pipelines initially, the more troubles are likely to arise later down the track.

#3

Data processing is rightfully the largest stage of the whole value chain, which is why many feel that it’s too varied and complex to be considered a single stage. Furthermore, ML engineers and data scientists support the notion of having an all-encompassing solution that will cover every logical step associated with data processing, which is seldom the case at present. This desire runs in parallel with the fact that user friendliness and interoperability with different services and platforms also poses a frustrating obstacle. Most ML engineers who have voiced their opinions want something that’s easy to use, easy to configure and ultimately easy to scale.

#4

Interaction between different stages is also a major bottleneck in the ML solutions market. This non-trivial issue is bigger than many realize because it’s vertical as well as horizontal. On the one hand, techniques and software tools that support and facilitate different ML stages often don’t agree with each other — there’s no universal ecosystem as yet that offers a smooth horizontal transition from one phase to the next. On the other hand, most solutions that exist are also convoluted and highly technical, meaning that there are few possibilities of a vertical collaboration between professionals with varying levels of expertise or different specializations. 

#5

All in all, none of the solutions seem to cover the entire ML value chain. A few cover five out of six stages (Vertex AI, Scale AI, Toloka AI, Abacus.AI, Appen), some cover four (H2O, Dataiku, Clear ML, and more), and three (Databricks, HuggingFace, and others). This goes to show how scattered the current state of ML is as far as having a well-balanced infrastructure with interconnected parts that can offer a flexible, all-in-one work environment for ML specialists. Still, there’s some change for the better.

Key takeaway

The ML field of today is quite disjoined. Many ML practitioners either struggle or don’t reach their goals at all. And not because of poor code or even noisy data. But due to software incompatibility and poor strategizing within the chain that usually starts to assume form right from the get-go.

Is it possible or does it even make sense to cover all six stages? We are sure we will find out soon, as the ML industry is developing extremely fast with so many amazing startups working on it.

Ksenia Se is co-founder and publisher of TheSequence.


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.