AI research has long been the domain of universities, public institutions, and large corporations. Thanks to some amazing developments in the field over the past few years (and a whole lot of hype), every startup, VC, agency, and hot dog cart is scrambling to find a way to get on that bandwagon — to be AI-powered, AI-adjacent, or just faking-until-we-make-it-AI.
This is peak AI. AI conferences, AI events, AI podcasts. Flash. Bang. $$$$.
Of the startups doing AI out there, the majority can be put into one of the following buckets:
- AI practitioners building the equivalent of a fancy job application, which often get acqui-hired at a markup before any real product launch.
- Startups that do everything manually (mechanical turks), which hope to one day have the skills and resources to automate but still want to ride the AI wave in the meantime.
- Established companies with clear models and revenue streams, which invest in their own future.
- AI researchers pursuing an ambitious goal, which typically struggle to raise money or must build tailored solutions (vertical implementations) to create revenue streams to keep their business growing (consulting, etc).
The typical VC is not set up for long-term research; any research really worth doing is typically a multi-year effort that could last longer than a fund’s lifespan. Raising future rounds requires clear traction on core KPIs, and I’m not sure most VCs count p-value and f-score as such. Revenue growth is not easy to show when you need years to bring a product to market. However, in rare cases, such as DeepMind, a company breaks through and manages to become a powerhouse.
Having a vision that grand is a huge gamble — one that can only succeed if it’s backed up by a solid business model, has a clear market that it targets, and is built on a clear foundation of targets and KPIs. Granted, that gamble paid off for us at EyeEm, but believe me, there were some terrifying months (read: years) before we started seeing any significant results or impact.
Research challenges every principle you might know on building agile, lean, scalable startups (buzzzzzzz). There is no easy way to build a minimum viable product, and good luck researching in sprints! There’s no such thing as lean academics. It is very difficult to define simple KPIs that we can truly and comprehensively track. Initial funding is readily available (peak AI, remember), but beyond that, it becomes very difficult.
Here are some of the main challenges of running research at a startup.
Machine vision for us started as a means to an end. In order to empower any photographer to find their best photos and earn money with them, we had to move away from a manual review and keywording process to an automated one. While the solution is a highly technical, research-heavy one, it can only generate real value if it aligns with what our business team needs, how our photography team works, and how our product team wants to translate those needs and workflows.
Working cross-functionally in this fashion is a very challenging process, requiring our researchers to detach from “pure research” and deal with real-world requirements, and our product and business teams to understand and plan around high levels of uncertainty. You need a special set of people on your team who can work in an environment like that, alternating business or research hats as we move forward.
Surprise. Hiring highly qualified people in technology is hard!
After emerging from their AI winter, machine vision researchers are highly in demand, and all the big boys are throwing money and perks at them like it’s Christmas. You only need to go to a conference like CVPR or ICML (I’ve heard 30 to 40 percent of attendees work for Google, Microsoft, or Facebook) to experience that firsthand. Globally, there is a candidate pool of only a few thousand people that fit the profile we are looking for. That’s terrifying.
Fortunately, it’s possible to find researchers with a scientific/research itch who are also intrigued by the entrepreneurial side of things. People with the desire to build something innovative and push the boundaries of what’s possible, and who understand the need to ship quickly/often (as irritating as that might be). People comfortable with uncertainty and pushing the boundaries of their comfort zones, who want to collaborate cross functionally. It’s a symbiotic relationship. An expensive, symbiotic relationship.
Another major challenge is that the process of training deep learning models is an inherently experimental one, akin to turning and twisting metaphorical knobs and levers. Looking in from the outside, this means that research teams disappear into their world for weeks (read: months) on end. Sometimes they emerge with amazing results after two weeks, sometimes with a failure after three months. Back to the drawing board.
And this is for iterative work that improves on the quality of existing models. Very often, trying to solve a new problem takes months before any initial results are available, and more often than not, we end up solving a different problem than originally planned. Try building a short term product roadmap with that!
The nondeterministic nature of this beast makes it a veritable limiting step, a black box around which other engineering, product, and marketing processes had to be designed. We generally knew what the team needed to build/iterate on one of their models, just as we knew how that model would be put into production. In between, we (non-researchers) wait and pray. Working at a startup means having to be comfortable with that dark cloud of uncertainty constantly looming over your head.
By the way, while it’s MUCH cheaper than it was a decade ago, training these networks still requires special hardware that isn’t exactly cheap. I shiver at the cost when I read that Facebook conducts 1.4 million experiments a week. Just try to be lean while building AI.
For us, the main challenge of detecting what’s in a photo was addressed as soon as we found a reproducible, scalable method to learn new concepts from an R&D perspective. Beyond that point, a lot of the work became rather iterative. While significant statistical jumps in accuracy still require larger algorithmic improvements, a lot of gains can be made by iterative work (training with larger data sets, for example). This kind of work does not qualify as exciting research.
From a company perspective, on the other hand, “done” means 100 percent finished. As long as we didn’t fully automate keywording every single photo, it was still an open problem. You fix your precision, optimize for recall to solve search, fix your recall, optimize for precision for keywording individual photos — it’s a vicious cycle.
Innovation vs. process needs
If we spend all our time iterating and improving on known unknowns, we lose out in the long run. I always say that photo classification is a race to the top and the bottom at the same time. Our models regularly outperform those of much larger companies — but as this becomes a commodity, economies of scale kick in, and you don’t want to be competing with AWS when your unit of currency is $/GPU hours.
Not to mention that once the research problems are tackled, the iterative work will eventually get tedious for a researcher. We believe the solution to this problem is to clearly delineate how we approach projects:
- iterative applied research (3-5 week cycles): improving existing models mostly through more data, but occasionally algorithmically as well.
- new applied research (5-10 week cycles): implementing new algorithms when just adding more data simply doesn’t cut it.
- pure research (3-6 month cycle): working on an open problem that will still be relevant in the future. This is where the magic happens.
Of course there are loose ends to tie up — handing over libraries to engineers, cleaning up code, and working cross-functionally with our photography, product, and business teams to define how our tech is integrated into our products.
Many AI startups also want to make sure they have room to write articles and publish papers on the work they’re doing. This can only work if researchers find a healthy balance between tasks and requires people who are comfortable wearing different hats.
Working closely with our research team these past few years has been an inspiring (we can do that?!), humbling (I thought I knew math!), and frustrating (how accurate is it? Just give me a %!) experience. It’s different than anything I’ve had to do before.
In three years, we have taught machines to fully describe the contents of photos, rank them based on their beauty and commercial value, and personalize these ranks for individual tastes. We managed to compress the algorithms so much that they run in real time on mobile devices, we’ve built technology that lets us train these algorithms in real time and expand their vocabulary as needed — and we’re just getting started.
Startups are hard enough as it is. Doing serious research at startups is as close to the edge as you can get.
This post appeared originally at EyeEm.
Ramzi Rizk is the cofounder and CTO at EyeEm, a stock photo library.