“The core part of our AI strategy is to get as close as possible to having a human-to-human experience,” Duolingo AI and research head Burr Settles told VentureBeat in an interview at London’s AI Summit last month.
Duolingo, for the uninitiated, is a cross-platform app where users can learn languages for free, though they can also cough up $7 each month for a premium service that removes ads, delivers offline access, and more. Through gamification and bite-sized lessons, anyone can learn to read, listen, and speak in dozens of tongues.
People’s reasons for learning a new language vary — perhaps it’s to boost their appeal with prospective employers, to converse with a new partner’s parents, or simply for personal fulfillment. But whatever the motivation, learning a language takes time and effort — all the more so if the learner is not immersed in the language 24/7.
Most people can’t move to another country just to boost their language skills, so companies like Duolingo have capitalized on the rise of smartphones and ubiquitous connectivity to bring lessons to users, wherever they are.
Duolingo already supports many of the world’s most common languages, including Chinese and Hindi, not to mention fictional vernaculars, such as Klingon. Earlier this week, the Pittsburgh-based company finally rolled out support for Arabic — one of the world’s most-spoken languages. Duolingo now claims some 300 million users globally and has raised north of $100 million for a valuation of around $700 million, with big-name backers including Alphabet’s CapitalG and Kleiner Perkins.
The global online language learning market was pegged at $9 billion in 2018, according to Verified Market Research, and could hit more than $20 billion by 2026. Against this backdrop, Duolingo has been investing in AI and machine learning to make lessons more engaging by automatically tailoring them to each individual — kind of the way a human tutor might.
VentureBeat sat down with Settles to get the lowdown on the company’s reliance on AI and related techniques, some of the challenges involved, and where things could go from here.
After a stint as a postdoctoral research scientist at Carnegie Mellon University, Settles joined Duolingo in 2013 as a software engineer, covering everything from the front-end to the backend. He said he chose Duolingo over bigger companies because of the potential he saw in the role.
“My interests are at the intersection of language, AI in tech, and cognitive science,” Settles said, noting that there aren’t many jobs that fall at the crossroads of all three. “You can probably count them on your fingers,” he added.
Soon after Settles joined Duolingo, he and the team began identifying ways to transform the building blocks of Duolingo’s learning models, which had been loosely based on flash card scheduling algorithms from the ’70s. One of the challenges, according to Settles, has been that there is very little research on leveraging AI for education at any real scale. “What few publications there are, there’s two main problems with them,” he said. “One is that they’re usually like laboratory studies, with, like, 30 people and mostly 30 American undergraduate students. And that’s a very different population compared to the 300 million people from all over the world with different backgrounds [that use Duolingo].”
What Duolingo did have was a wealth of learning data that could be used to develop new models and algorithms from scratch.
“Part of the reason I took the job is the amount of data and the type of data and the uniqueness of the data,” Settles said. “We’d been using heuristics, and we were collecting data about exercises that students got right, what they got wrong, and how long it had been since they last saw it in the app. And since we were tracking those statistics, we thought ‘Why not create predictive models to do that instead?'”
With that in mind, Duolingo has been developing its own statistical and machine learning models, while also incorporating tried-and-tested learning techniques like spaced repetition to optimize and personalize lessons. The theory behind spaced repetition is that repeating short lessons at intervals is better than cramming the same information within a short time frame. Related to this is what is known as the “lag effect,” whereby users can improve more if the gap between practice sessions is gradually increased.
But the main problem with programs delivered automatically rather than by a human is that people differ widely — depending on their existing knowledge of a language and personal circumstances or temperament. And machine learning models tend to be binary, rather than taking into account the nuanced nature of the individual. This is where Duolingo’s statistical model — known as “half-life regression” — comes from. It analyzes the error patterns of millions of language learners to predict the “half-life” for each word in an individual’s long-term memory.
“When we put it into production, we saw a 12% boost in user engagement,” Settles said.
For context, the half-life concept is often used in physics to describe the time required for a quantity to fall to half its initial value. In language learning, this could describe vocabulary or grammar knowledge inside your brain — so if a half-life is a day and you go a day without practicing a new language, there is a 50% chance that you will forget the lesson. But it’s not an exact science — half-life regression is all about getting inside a person’s head, figuring out what they know or don’t know, and then targeting course material accordingly.
“If you have two people, one who has never learned French before and another [who] took four years of high school [French], they’re probably very early on going to exhibit different patterns of what they get right and wrong,” Settles continued. “And so the ‘decay’ patterns will look very different from both of those people. The person who already has a background will make fewer mistakes, and the types of mistakes they will make [will likely be different], meaning that they don’t have to practice those things as often.”
Methods used to target content — like factoring in half-life regression to get inside a student’s head the way a teacher might — are important. But the content itself is just as important, and here Duolingo is also turning to AI — to help its team build the right curriculum.
“There are millions of words in the English language, and maybe 10,000 high frequency words — what order do you teach them? How do you string them together?” Settles said. “So we’ve built systems to help the content creators tailor beginner, intermediate, and advanced level material.”
An additional challenge has been that while only 40% of Duolingo’s users are learning English, most of the pedagogical data the company employs to train its AI systems is developed for English. So Duolingo has effectively had to take its systems and project them onto other languages, in what is known in the AI world as transfer learning.
There is a well-documented AI skills shortage — though the talent pool is slowly growing — and many of the big tech firms have been fighting to acquire promising AI startups. This talent crunch is something Duolingo has found challenging over the past few years, particularly given its focus on specific skillsets. The AI research it is doing crosses a range of disciplines and intersects with psychology and learning science, in addition to language and linguistics.
“We want more people at that intersection of language and AI and cognitive science — those people are not a dime a dozen,” Settles said. “And also our bar is very high. I was recently looking at the numbers for this — less than half a percent of those who apply to our AI jobs make it all the way through.”
Settles added that the company has detected a small uptick in interest from qualified people over the last 18 months or so, including applicants from other tech companies and from academia.
“There are quite a few people from larger tech companies, and we also are hiring a lot of new people straight out of PhD programs — mostly because they’re a little bit more open-minded, and they haven’t been, you know, institutionalized,” Settles added.
One of the biggest challenges with teaching a language remotely is that it can be difficult to create an experience engaging and immersive enough to keep the learner coming back. In an effort to boost engagement, Duolingo in 2016 launched bots to help teach languages through automated text-based conversations inside its app.
Various bot characters were designed to respond differently to a range of possible answers, and users could hit the “help me reply” button if they got stuck. The bots should in theory get smarter the more they are used.
Duolingo’s bots appear to be on a temporary hiatus for now, but this kind of learning — in which automated agents take the place of human tutors — could elevate virtual teaching to the next level. Recent developments in conversational AI assistants, such as Amazon’s Alexa and the Google Assistant, could open a whole new world of opportunities for language learners. Imagine if saying, “Hey Alexa, I’m ready to learn French,” could kick off the next installment of your language education? And what if Google Assistant could correct your pronunciation and grammar just by listening to you?
Throw the possibility of virtual reality (VR) into the mix, with users able to slip on a headset to enter a virtual classroom environment, and it’s easy to imagine how much more engaging learning a new language could soon become.
When pressed on the likelihood of Duolingo expanding into such immersive arenas, Settles did not comment, beyond acknowledging that “it’s possible.” But the company seems well aware of the inherent benefits these emerging technologies offer, and the potential for greater immersion could be huge.
While Duolingo hasn’t divulged any plans around intelligence voice assistant integrations or immersive visual worlds, it has committed to further personalizing its content and delivery as it works to put the human element into automated learning.
“If you think about the way good teachers operate, there’s kind of like three properties that they have,” Settles said. “One is that they know the content really well, and the second is that they have a way of getting inside your head, figuring out what you know and what you don’t know. And the third is being very engaging, and finding good ways of engaging you with that material at the level where you’re at.”
“The half-life regression is one example of getting inside your head, figuring out a mental model of what you know, what you’re struggling with, and targeting that material to [you],” he said.
“There’s a lot of uncharted territory there,” Settles added. “There’s lots of opportunities, I think, for AI to make new and engaging learning experiences.”