How Microsoft created a virtual assistant that could blow Siri away

BUSINESS INSIDER

Apr 21, 2014

Windows Phone is still a distant third to Apple and Android in the smartphone market, but Microsoft is hoping to change that with the introduction of Windows Phone 8.1— and more importantly its personal digital assistant Cortana.

Microsoft claims that Cortana isn’t like your average virtual assistant. She’s supposed to be a little wittier, more personable, and capable of learning more about you than Siri or Google Now.

After using Cortana for a week and speaking with Microsoft’s Marcus Ash, Partner Group program manager, it’s clear that the company’s got a lot riding on the success of its new virtual assistant.

But there’s still several obstacles in Microsoft’s way. Besides the competition from Apple and Google, virtual assistants still haven’t really become a mainstream part of a smartphone user’s everyday routine.

We sat down with Ash to talk about how Microsoft created Cortana, its plans for the future and its strategy in facing the competition.

Business Insider: Microsoft interviewed real-life personal assistants when creating Cortana. What was the strategy behind that?

Marcus Ash: If you want to make a real humanistic connection with that technology, the best thing you can do is find a set of humans that do the job we think this phone should be able to do. We [asked] them, ‘What do you do to really make the person that you work for happy? What types of [tasks] do they ask you to do?’

The other area we were focusing on was how much personality we should attribute to this assistant. These machine learning systems need a lot of data. So if you don’t ask the right questions, then you’re not going to get the right data, and then the system can’t train itself. So it never really gets better.

You need a pleasant sounding voice. You need to make sure that voice sounds as human as you can possibly make it.

“You need to make sure that voice sounds as human as you can possibly make it.”

We need to make sure that the voice actually has human sounding phrases to say. When you ask a question that you would ask a normal person, the system should respond the same way a person would respond. So we really thought of all those problems.

So we thought let’s go talk to people who have these personal assistant jobs where we could get a flavor for how much of their personality comes through on the job. We examined the dynamic between the person that’s being assisted versus the assistant.

BI: In what other ways did you study real personal assistants?

MA: We interviewed these people that had these high-stress jobs, meaning they were assisting people who were celebrities where it really matters that you’re getting things right. I think it was somewhere between five and seven assistants that we interviewed over the course of one week. And we had them keep a journal, and we looked through those journals and we looked back and did exit interviews.

We asked them tell us about the relationship with the person [he or she] worked for. We said, tell me the types of things you do for them. Tell me how much they have to ask you to do things for you versus how proactive you are. That’s where we got a lot of insight.

BI: So what was the most important thing you found through that process?

MA: It’s all about trust. This person tells me very private information. And this person expects me to keep this private information between us. They didn’t go into details, but you can imagine the kinds of things that an assistant that follows that type of person around might see or hear. If the person doesn’t trust me, then I can’t do my job effectively because I’ll be limited in the information that I get from that person.

I just had this idea that the personal assistant knows the person [they’re assisting] so well that they can anticipate things. I just wouldn’t have guessed that it was so rigorous. One person pulled out her journal. She called it her bible of this particular person.

And she wrote down this extensive set of notes ranging from people that this person had met, and what this person was wearing the first time they had met, and very detailed work around what they think they would need to refer to at a later time. And she had this great quote about how there’s a difference between what the person says they like and what they really like.

BI: I can see how trust and personality would be very important for a personal assistant. But how do those characteristics translate to a phone?

MA: The notebook in our case belongs to Cortana. It’s actually her view of you. It’s based on what you say, and it’s based on you giving access and having this trust relationship build. She’ll never put anything in your notebook that you’re not aware of, or that you don’t trust. But she’ll make inferences about you based on the information you tell her.

She works for you, so you can take that and say that’s wrong, and ‘why would you assume that I like this particular restaurant?’ And she’ll say ‘well because I observed this particular behavior about you.’ And if you say ‘well that’s not correct,’ she’ll respond ‘okay got it.’ So we thought that metaphor really worked for us and it would be a great way to translate the design of what Cortana knows about you.

This is a very personal thing for people. Especially as we look forward to the future at all the types of things that our devices are going to know about us. Knowing that you can trust this device to do the right things with that data is really one of the key points we honed in on early by talking to these assistants. Personality, I thought, was a real breakthrough for us. If you don’t have a personality, it’s really hard for people to trust you.

“If you don’t have a personality, it’s hard for people to trust you.”

BI: In my personal experience, it doesn’t seem like people really use Siri or Google Now too often. I never see people talking out loud to their phones in public, and I personally don’t use Siri much on my iPhone unless I’m setting an alarm or reminder. Can Cortana make the virtual assistant more valuable than just a shortcut to setting an alarm?

MA: People have been working on speech systems for years. They’re very complex. They’re very difficult to get right. The thing that gives me a lot of hope on these systems is that we’ve reached a point where we’re collecting so much data about speech that the speech systems are improving at such a rapid rate. It’s much better than it was five years ago, and it’s growing at such an exponential rate because there’s so much data being poured into smartphones. So that makes me feel great.

These conversational systems are on this natural progression curve where they’re going to get really good. You’re going to see less false positives and problems with recognition. One of the main reasons we had to go into beta actually is because the system is only as good as the amount of data you have. And it just takes a certain amount of time to train the system.

BI: The smartphone market share in the U.S. is largely dominated by Apple and Android. Could Cortana and Windows Phone 8.1 influence smartphone shoppers to turn to Windows Phone rather than Apple or Android?

MA: We think it’s [the virtual assistant] going to be one of the next big things that distinguishes these platforms. How good is the assistant and the contextual learning technology on this phone? So I think that’s a longer-term vision. Even in the shorter term, the idea that Windows Phone has done something that’s interesting and unique and that we’ve got a distinct point of view about, it feels like we’re getting a lot of pickup.

BI: Google Now also develops and changes the more it learns about you. It also seems like Motorola’s Moto X is very contextually-aware and is sort of based on this idea of being your personal assistant. What makes Cortana different or better than Google Now?

MA: We focus a little bit more on contextual triggers that we think people will actually understand. We always talk about 3 triggers—we talk about time, we talk about location and we talk about people. Those are the triggers people get. Let’s just focus on a set of things that are going to be of high utility and limit the number of triggers so that people can understand what the system is capable of.

For us is a notion of personality. When we look at Google, they’ve made some pretty clear decisions. It’s about getting you quickly and efficiently to Google’s services. It’s not about personality. There’s just something really delightful that makes people smile about having an anthropomorphic personality inside this assistant. We studied this a lot and looked at people’s reaction in labs; it just makes people smile. It also opens up this type of trust relationship we talk a lot about.

Google has got a decision to make around how they’re going to create a personality.

“Google has got a decision to make about how they’re going to create a personality.”

They have to really think hard about what the future of these assistants is going to be, and whether or not people are going to get used to talking to a search engine.

People love the fact that they can talk to this like they’re talking to a person.

It’s ultimately a question of who can get the most information in a privacy- controlled way that’s still personal. The more you know, the better these systems are going to get. And if you can’t do that, I just have a hard time seeing how you can evolve this in a way that could be truly revolutionary.