This year has brought a fair number of shakeups at some of the world’s biggest tech companies. Google’s AI chief left for Apple, Amazon’s AI research chief left for Google, and Windows chief Terry Myerson left Microsoft as the company shifts its focus towards AI and the cloud.
Around that time, Javier Soltero moved from his role as product lead for the Microsoft Office group to become product lead for Cortana. Soltero will lead the Cortana team with Cortana engineering VP Andrew Shuman, a spokesperson said.
Soltero, who admittedly has no background in voice or conversational AI, takes charge of the Cortana team at a crucial time. A recent survey found that Cortana, an AI assistant used by more than 140 million monthly active users, ranked highest among users in an office setting — but that may be the only bit of recent good news for Microsoft’s assistant.
The Harman Kardon Invoke, the first smart speaker with Cortana inside, hit store shelves in the fall, but it had to slash prices, and another recent survey found that the speaker has failed to gain traction among consumers in any meaningful way. That should be concerning since Google and Amazon are selling millions of smart speakers, and PCs are still the primary means by which people speak with Cortana.
Outside of a change to Cortana for Windows 10 last month and a quicker iOS app, no major features have been added for Cortana since late 2017. The Cortana Skills Kit doesn’t appear to have gained much traction with companies or developers. The Alexa-Cortana partnership that was scheduled for release at the end of 2017 still hasn’t become available, but in November AWS launched Alexa for Business to compete with Microsoft in the workplace.
Soltero spoke at length with VentureBeat about his plans for Cortana, potential new features, Microsoft’s plan to distinguish itself from competitors, and why he says it’s Amazon, not Microsoft, who has to prove itself in the workplace.
If last year is any indication, new features or changes could be on the way for Cortana as part of the news to be announced at Microsoft’s annual Build conference in Seattle next month.
At Build 2017, Microsoft launched the Cortana Skills Kit, showed off the Invoke smart speaker for the first time, and shared plans to work with HP and Intel to bring more Cortana-powered devices to market.
Soltero came to Microsoft after his email company Accompli was acquired in 2014. He was instrumental in the redesign of the Office mobile app, which is important because email, he said, will be crucial to the future of Cortana.
This interview has been edited for brevity and clarity.
VentureBeat: Cortana’s biggest distinction when compared to other assistants appears to be email, the insights it can draw from email like suggested reminders, so how crucial is email to Cortana? It seems like a core part of Microsoft’s work-life balance pitch.
Soltero: I think it’s super important in that it is both the primary communication vehicle for an enormous amount of people around the world at work and it’s still very much relevant in their personal life as well. The data that email represents in addition to the email itself, it speaks to who is important to Javier across the spectrum. So that intersection of what Outlook is, in that it is the keeper of many of your communications, your contacts, both the explicit address book as well as the inferred address book — in other words, who did I actually correspond [with] whether or not you stored their contact info.
VentureBeat: Right, you’re signaling your priority contacts.
Soltero: Yeah, and also the schedule, right: knowledge about your time. Those are bedrock sources of information and insights that are required. If you really want it to be assistive — and by assistive I mean enable people to get more out of their time — you have to have access to their time, and you also have a sense for who and what they’re communicating about, so yes, it’s a bedrock thing for Cortana.
I think in general for the overall Microsoft story, our approach to the market — both for enterprise customers and even for consumers, as you’ll see over the course of this year — is going to be more about us being able to be stewards of a very specific and important set of information. And having earned that trust at both the individual and the organizational level, I think that’s something we need to continue to embrace.
The set of experiences on top of that that make use of that to proactively tell you that you double-booked for some stuff tomorrow and you probably should pick one or the other, or I can suggest how to reschedule that, or that you sent an email that said ‘Let me send you that tomorrow’ and yet you didn’t do it, so it’s my job to be like, “Hey, nudge nudge.”
I’m here in part because the company didn’t have its compass; it didn’t have a single clear point of view around the human need and the people that we were trying to serve.
We want to delight the end user, but the wedge and the real clarifying part for all of this goes back to the idea of choice. When people have a choice, you can try to cheat and force stuff down their throats, which is no longer practical or appropriate, or you can ground yourself by saying there are multiple options for communications technologies, even email clients.
We knew that at Accompli, and we brought a certain set of opinions to the product that made a big difference, so I think that’s how the company is seeing this and how email and the core Office experience will influence Cortana and products like it.
VentureBeat: I was reading the Axios exclusive on you joining the Cortana team, and in the piece you mentioned the idea that mediocrity won’t cut it in the AI assistant race. What do you think Cortana needs to do to overcome mediocrity?
Soltero: I think I spoke of it in terms of a kind of horizontal approach to delivering useful capabilities, to say “Here is a way for you to interact with email via voice.”
The missing ingredient there is a clear understanding of when you need that. When you’re talking about assistants it’s important to ground yourself very much in who you’re trying to assist, and how, where, and in what situation is that assistance most valuable or has the biggest potential to make an impact in having that person feel like they were rewarded.
Another way of putting this is, and I often get boxed into this a little bit, when people say “Well, Microsoft has so much strength at work so clearly that’s where you’ve got to go nail it.” I mean, we certainly are very invested in winning for Cortana at work and have some unique workplace-specific capabilities that we have either brought to the table or are going to bring to the table, but I have a hard time thinking of a product experience that I had with technology at work that was so awesome that I actually brought it into other parts of my life.
We go from having speakers in our kitchens and homes to saying, “Hey, how will these things really work in a conference room or workplace setting?” It’s important to have have that sensibility to understand that it has to be just awesome and useful. And they’re not two different things: There isn’t a “work me” and a “personal me,” and I don’t have a different bar for utility or quality of experience across those two things.
VentureBeat: But is that sort of what Cortana needs to do to rise above that mediocrity? That it has to effectively convince people to bring it home with them from work?
Soltero: No, I mean, it’s home with them already in the form of PCs and Invoke speakers and so forth. We’re not trying to force that movement to happen. We are steering the product experience towards delivering value in each of the three states that people are in, which is, in very generic terms, at work, at home, or somewhere in between — at least for our target user — and really honing in on what can [Cortana] do for that user in that setting that is assistive.
And of course there is always the opportunity to cross-pollinate or bleed over or whatever analogy you want to use from one situation to another, but the difference in approaching this effort from a platform perspective is saying we just want to horizontalize all the things and have people tell us through their use what works and what doesn’t.
There’s alternatives, they’re all great in different ways, and you want to win the Pepsi Challenge, right? You want to win the taste test. And when you don’t win, you want to know why, and you want to potentially be OK with it.
It might be because there will be multiple assistants in this world, there already are, that there are assistants that are more skilled at providing value to users that have a specific set of challenges. The argument that we can just pivot entirely and be great for people at work forsakes the idea that more and more people are wanting to be supported across their day because those lines are far more blurred, so it’s a very interesting problem.
VentureBeat: The entire Cortana-Alexa partnership aside, what does Cortana have to do to compete with Alexa in the workplace?
Soltero: I would in all honesty put it in the other direction and say what does Alexa have to do to compete with Cortana in the workplace for the simple reason that earning the trust of organizations is not a matter of bringing a device into a room and dropping it on a table.
For sure in the spirit of having validated the experience in a home setting and having found it to be useful and compelling, that motivation is informative for us and for Amazon all the same, but manageability? Compliance? Trust? There’s a range of very reasonable and hard-to-satisfy expectations that enterprise customers have that give us — I don’t want to describe it as a unique advantage, it would be at odds with this idea of winning by choice, it’s just not trivial.
It’s not just a matter of lighting up these devices in the workplace, it’s a matter of fitting well with the organization and the way that the organization is managing the data that they provide.
I think as you get into the workplace, you get an even higher bar for truly validating the difference between convenience, which is mostly what these sort of command-oriented experiences that are being delivered via voice speakers are, and real assistance, which is: Are you proactively doing stuff for me? Are you actually helping me stay on top of my day?
Trust and compliance issues aside, that’s what we’re very familiar with. We live this everyday, and like anyone else we don’t have a perfect record on this, but we have I guess earned our stripes.
VentureBeat: Qi Lu (who helped create Cortana and is now at Baidu) was talking about how Cortana had fallen behind…
Soltero: Did he? I clearly didn’t see this.
VentureBeat: He was saying something along the lines of “Amazon found the right form factor in the smart speaker for an AI assistant to exist.” I guess I’m asking if you think Cortana has fallen behind in any way, and, more to the point, are there features you want to see included with Cortana going forward that are already popular on other platforms like Alexa or Google Assistant?
Soltero: So Cortana was born in a multi-device world primarily actually on a PC, and the speaker category didn’t exist. That form factor was not known — though we’ve got an incredibly large and very bright research organization, I’m sure we had thought about this.
So if we missed anything it was the fact that there was an opportunity to do things that were easy to dismiss as more basic, more command-oriented, that you didn’t need to solve a lot of the hard problems around conversational language understanding, real conversational interaction, and so forth. We believed then and believe still today what everyone is chasing after requires those steps.
The realization that everyone, including Google and Apple and everyone else, had around the arrival of Alexa was that actually it turns out that there is a journey through which you can start delivering convenience and value and build from that. But no one — certainly not Amazon or Microsoft or even Google — is blind to the significant leaps still yet to be taken around what it will take to build the real assistive experience, both from the way in which you interact with the thing to how it handles errors to how do you actually activate an ecosystem inside of these kinds of devices.
I don’t know what this thing can do (points at Invoke) because it doesn’t actually tell me when I look at it. I have to ask it different questions, and sometimes it’s like, “Well, go see the app,” so we’re still at the very beginning of a promising race, one that we’ve actually been running for a long time.
As far as your second question, which features to add, I think there is certainly plenty of inspiration to get from other assistant experiences. I think the short answer is yes, there’s a lot of things that are very instructive about how different assistants are evolving their experiences. I don’t believe that any of what we’ve seen so far is definitively “you’ve got to have that.”
There’s still a tremendous amount of innovation left to do around even just the skills piece itself. We could have the “who has more skills” battle, but the reality is that the usage of these skills, on balance people still have a narrow set of things they’re dependent on these devices for. I would look towards our opportunity to assist you across the places that you’re already at so here (points to computer), there (points to Invoke), and there (points to smartphone), and specifically the places where you need that the most.
We already have the infrastructure and scale and usage across those three things to be able to deliver on the real assistive experience. The difference is that we were approaching it in kind of a different order and we overpivoted on the platforminess of it, right? We’ve now sort of refocused the organization. We talk in terms internally of promises that Cortana expects to fulfill for its users and the specific situations in which those promises are most valuable.
VentureBeat: When you talk about “platforminess,” are you talking about the Cortana Skills Kit?
Soltero: I mean the Cortana Skills Kit, or even internally with the first-party experiences. It’s the idea that there isn’t enough of an opinion on existing skills we ship with, there isn’t an affinity necessarily for a given skill and the place that it would be most valuable to use.
For example, where do I really need voice email the most? Is it here (points to PC)? No, I have a screen and a keyboard.
Then there’s cross-cutting skills that we’re doing today — that’s unique to us — that extract commitments from your communication and being able to sort of move those along and present you with notifications in an ideally contextually relevant way that says, “Hey, you told Keisha you were going to follow up with her about this; did you do it?”
I think those are part of a real rich landscape. It’s stuff we’ve shipped and is out there; it’s stuff thats going to get I think significantly better as part of us really pushing to say “Finish the scenario.”
Don’t just ask me questions about “Hey, you mentioned you were going to do it. Did you do it?” I’m like, “You should know, you have access to the rest of this stuff, right?” Completing the scenario for the user — not trivial, but definitely high impact and unique to us in terms of what we can do.
VentureBeat: I remember my immediate thought after the Alexa-Cortana partnership was formed was, “Well, Alexa’s got like 20,000 skills and Cortana’s got maybe 100 at that point.” Do you think there was a damper put on the third-party skills?
Soltero: No, I think that the conversation with Amazon and the work we’re doing with them, which I’m really excited about, is about both companies recognizing that it’s going to be a multi-assistant world and the need for us, given our shared interests in this space, to do the hard work of figuring out: How do those two assistant experiences relate to each other? How and where can they coexist for the benefit of the user?
And part of the reason why we haven’t come back out to the world to say, “Hey, here’s what we came up with” is because arriving at the right solution for users is not just the task of wiring these things together and having them share a space inside the same black cylinder that’s sitting on your counter.
It’s been a very informative journey for both companies and really understanding what each of us brings to the table, how we can help, where is the overlap between our users and the interests that they have, how can we provide experiences that are great, and I think we’re making great progress on that.
I think you’ll see soon that there is proof in the pudding, but we’re not just going to hurry up and ship something just to satisfy the — I wasn’t part of the organization when expectations were set about how hard this would be, but I would say for sure that everyone now understands that the importance to do this right outweighs the need to just show some stuff right that we want for the sake of both companies and their users to do something valuable. And I think we’re on to something.
Correction: An earlier version of this post misspelled Javier Soltero’s name in the headline. We regret the error.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here