Michelle Zhou gave me a shock recently when she showed me a graph of my personality, based on a sampling of my tweets. The IBM researcher can make an educated guess about anyone’s personality based on looking at 200 tweets by that person.
The research at IBM’s Almaden Research Center in San Jose, Calif., came from a new Accelerated Discovery Lab aimed at gaining insights from big data. Zhou can do this personality analysis in a disciplined and automated fashion, measuring 52 different personality traits for each subject. This “psycholinguistic” analysis could prove useful for companies that want to understand their customers in a more intimate way. But it also raises privacy issues and other concerns.
We talked those over in a full conversation with Zhou. Here’s our edited transcript of that talk.
VentureBeat: How did you get started with this in the first place?
Michelle Zhou: I’ve always been working in this general area. Before, I was dealing with intelligent user interaction – how you better understand a user’s behavior and a user’s intentions, so you can adjust a computer to adapt to that. Now, we’re thinking past user behavior modeling – clickthroughs, menu movements, window interaction. I was thinking, ‘Can’t we do a little bit more than this?’ I came across this idea of psycholinguistic analysis, and I saw that we could extend that to develop our own technologies and gain a deeper understanding of individuals.
Three or four years ago, we started a series in many people-centric operations, from marketing to customer service to product development to HR. A ubiquitous challenge there is that they want to gain an understanding of people as individuals. Understanding an individual has to go beyond behavior analysis. Amazon.com is a quintessential example of behavior analysis. You read something or buy something.
Many human studies in psychology, social science, and behavior economics show that people’s behavior and their decision-making processes in the real world are heavily influenced by what’s called their intrinsic traits. Those intrinsic traits include what motivates you, what you believe, your fundamental needs. Thinking about it, it’s very hard to imagine – in a traditional way – how you could learn someone’s intrinsic traits, aside from standard psychometric tests. You can’t ask a customer something like, ‘Welcome to my store, would you like to take a personality test?’ Let alone scale out to literally hundreds of millions of customers in the real world.
VentureBeat: Are we talking about the personalities of companies or individual consumers at this point?
Zhou: Like I say, it’s any people-centric operation. You want to understand the people involved in a process. It could be the employees of a company. For example, somebody wants to advance or develop their talent. You want to understand what this person can do. Also, for a consumer company – retail, consumer products, services like airlines or hospitality – they want to understand their customers as individuals so they can serve them as individuals.
The challenge here is, how do we gain this understanding of individuals and their intrinsic traits at scale? Then we found two things that could work to our advantage. Part of it is psycholinguistic analysis. Computers can derive people’s traits from linguistic footprints. That hasn’t been widely applicable before, because where do you get those linguistic footprints? Now, you can do that with social media and digital communications. Those are readily available, so we saw an opportunity there.
Before, psycholinguistics has only been applied to the Big Five personality types – not many other traits like basic human values, the things that describe your beliefs and motivations and needs. Marketing studies and behavioral economics have shown evidence that people who have different needs buy totally different things. Idealistic people often go for organic foods, organic skin care. They buy clothes for their pets. They go for brands that support that.
The research has been done at the University of Colorado and UT Austin. A psychologist developed a dictionary of categories to predict the words people use for certain things. People who are happy or depressed use different word categories. You can use more extensive word categories to try to predict where someone sits in the Big Five. However, that’s not sufficient. For example, no word categories exist to predict values or beliefs.
Zhou: Right. Our method comes from psychometric studies. You design these item-based psychometric studies and you get many people to take them. In the meantime, you also collect their linguistic footprints. You might say, ‘Describe your needs,’ and then we use statistical modeling to correlate the words in their linguistic footprints with their psychometric scores. That statistic model can become a predictive model, with the dictionary we have already, and we can apply it to other people who haven’t taken the tests.
Another thing we want to do, a business can ask their customers to opt in. Using myself as an example –maybe this is a bit extreme – I subscribe to e-mail lists. Sometimes I get something I really think about, but 99 percent of the time I get junk mail that has nothing to do with what I want. In this case, I may be willing to tell a company more about me – this is who I am, this is what I want – and that can help them send me more about what I really want. Or when I book a trip and check out hotels, I might want to know more about what hotels have been chosen by people who are like me. I can look at Yelp, but who are those people? Are they like me at all? Do they like the same things I like? With this, you can learn more based on the behavior of people who are like you. That’s the essence of what we’re working on.
We’re still at the research stage. We’re working with some of our customers to validate our work in their field. The reason we’re very excited is, first of all, IBM gives us an environment to work with many customers that other companies might not have access to. Second, it’s a broad range of work, from psychology to social science to computer science. We need that interdisciplinary background to put this together. Having that real world validation will be great.
VentureBeat: Is this a kind of big data effort? Do you have to look at thousands of people, thousands of messages?
Zhou: We don’t actually need a lot of messages. We’ve done extensive experiments and found that about 200 tweets from a given person – about 2500 or 3000 words – is a good representative sample. That gets us within about 10 percentiles of using thousands of tweets. So that makes a good estimation. The big data aspect, though, is that you have to look at hundreds of millions of people to make this valuable. You have so many types of people, so much data, and you can’t just go one by one.
I took the liberty of searching you on Twitter and created a profile of you—
VentureBeat: I was going to ask about that. I share our stories on Twitter, mostly. That’s the main function of what I do there. Every now and then, I have an opinion I throw out there. Is it more valuable for you to search for my opinion, instead of every single story that I’m sharing?
Zhou: I see. That’s a good point. In your profession, that’s unique. In our case studies, we’re looking at word usage, unless—When you write a story, you probably intentionally mask your normal usage of words.
Zhou: An interesting experiment to do with someone like you would be to separate your professional and personal writing and see whether those profiles are similar. We haven’t done that particular study yet. But what we’ve done is, we use a Twitter ID, and we also have an internal social media platform called IBM Connections. Those two profiles are similar. One is all about work, but it’s informal. It’s not like a paper I’d publish in a journal.
So here’s what we got from those 200 tweets of yours. You can tell us what’s accurate and what’s not. I should explain this a bit. The Big Five personality traits are sometimes called the OCEAN personality profile. O represents your Openness to experience. C represents Conscientiousness. You’re very dutiful, it seems. That’s very high. E is Extroversion versus introversion. You came up an introvert. A as Agreeableness. You also have a high level of agreeableness. N is Neuroticism, emotional stability. You’re a bit below average on this one.
These here are fundamental needs, based on work by Maslow in 1943. Currently we’ve identified about six needs. First, self-expression. Then, closeness, how much you value your relationship with family and friends. Then, harmony – whether you want to be in confrontational situations, or if you’d rather everybody gets along. Curiosity, how curious you are about new things. Idealism – are you a perfectionist? Do you want everything done in a very precise, perfect way? Excitement – do you go out and seek exciting things?
On the other side are what we call human basic values. These measure a person’s beliefs and motivators, what drives people. Conservation means you value tradition. Another is self-enhancement, a desire to develop and improve yourself. Fatalism involves seeking pleasure. Self-transcendence means working for the good of the world, a high level of altruism. Open to change means that you seek out different things to try. So you’ll have to tell us how accurate this is.
When we try to figure this out going by a person’s tweets, it’s simply by the histogram. There’s no numerical analysis. These things were derived from your 200 tweets, or maybe a few more, since you have about 18,000 altogether.
VentureBeat: Did you say something about depression?
Zhou: Self-expression, you mean. [laughs] It means you’re assertive, that you tend to build your own brand with the words you use. You maintain a self-identity. It seems like you’d do that in your work as a reporter. Closeness means you value family and close friends.
This analysis came together in about 20 seconds. If someone has only, say, 60 tweets, we can use the same analysis, but we know it won’t be accurate.
VentureBeat: Would something like Facebook be more useful, because people aren’t so limited by the number of characters?
Zhou: Like I said, we’re also doing this with IBM Connection, the internal enterprise social platform. That has longer blocks of text. We haven’t found any really huge differences when we examine people who use both that and Twitter. Even though Twitter messages have a lower number of characters, the words people use are consistent, so they’re a good indicator. We’ve run studies where people take psychometric tests, a 26-item or 52-item test, and compared the results to our Twitter-derived scores. We’ve found that they’re highly correlated. Not for 100 percent of people, certainly, but around 80 percent of the people in our test population.
Also, the way a lot of people use Facebook, they may not output very many words. They’ll like other posts and post pictures, but that’s it. We need 2,500 or 3,000 words.
VentureBeat: What about the idea that somebody might be using their Twitter to play some kind of role, or project a different personality for people to see?
Zhou: Yeah. In that case, we wouldn’t know. Another good example, yesterday I ran a profile for a customer of ours on a CEO of a company in Asia. I knew him very well, the CEO, and I said, ‘Listen, I want to warn you. He has about 20,000 tweets. Are you sure he wrote all of these? He might have a publicity staff doing it for him. You might want to take these with a grain of salt.’ Same with a celebrity, like Kobe Bryant. Someone like that may be trying to establish a certain type of image. With our technology, we can go by what they say. You might even see something like that with the President, depending on different presidential speechwriters.
It’s interesting. With a company like this one, people sometimes have a negative feeling about this work at first – ‘Oh my God, these people know so much about me.’ We want to avoid that. When we ask people to look at their own portraits, we actually tend to get very positive feedback. People say things like, ‘I really discovered another side of me, something I didn’t think I have.’ One person said, ‘I didn’t realize I was so open on this scale.’ Another said, ‘I want my colleagues and my management to understand who I am and appreciate me, to give me assignments that fit me the best.’ Others have said that they can use it to see where they can improve themselves in certain ways.
Of course, you can’t always avoid a certain amount of manipulation. People are people.
VentureBeat: It’s interesting to consider why some people why some people would want to know the personality of another businessperson they’re dealing with. A negotiator would love this stuff. Are there some other business purposes that make this a good thing to know about people?
Zhou: I was talking to hospitals in San Francisco. For patients with, for example, diabetes and weight problems, the number of people who adhere to a wellness program, or even treatment, is very low. It can be around 30 percent. Hospitals and health care providers really want to be able to engage their customers to make sure they stay on their wellness program. It’s good for the patient and good for the business.
If they found someone who couldn’t stick to the wellness program because they’re not very organized, because they always forget things, they could intervene to deal with that specific problem. If you know that a person values family and friends, you can leverage that – “Do this with your family members together so you can stay fit and spend time with them.” Once you know someone’s motivations, you know how you can encourage them – or at least that’s the hypothesis that we wanted to test out.
VentureBeat: Inevitably people are going to be concerned about privacy at this stage. Is all you can tell them, “Well, don’t tweet as much”?
Zhou: [Laughs] No, we have to be very open about that. Let’s use health care as an example. If I’m a customer of this company, when I log in to my account, I can see what doctor’s visits I’ve gone to, the vitals from my last checkup, and so on and so forth. What we can tell people using that kind of service is, ‘You can use this tool to learn more about yourself and about how other people like you are staying healthy.’ Someone like me, I’m very busy, but I also like going out and being social. What kind of wellness program works for me? A health care provider can let me know that if I share some of this data – say, a Yelp profile and a Twitter feed.
My take on this is that for regular people like us, we can’t afford our own exclusive personal physician. We can’t afford a personal financial analyst or personal trainer. But if other people like us have had a good experience taking a certain path, we can learn more and benefit from that.
Of course, an enterprise should be very up front with the customer. ‘Would you like to share this?’ The sharing should also be largely anonymous. It’s not like I should hear, ‘Dean likes this, so you might too.’
We’ve talked to lots of companies about the pilot program, and their comments have been very similar. It’s not necessarily that they want to use this to make more money from their customers, but it comes as a side effect. If they don’t know enough about their customers, they just bombard them with inaccurate messages, and that loses them their investment in the end, because it drives customers away.
VentureBeat: It does seem like there’s an area where you definitely want to limit your capability to identify things. California is moving to restrict employers from being able to use Facebook analysis to reject somebody’s job or college application. People are just way too honest on those platforms. It almost seems like the anonymous route is a safer direction to move in a lot of ways. You can draw a lot of anonymous data about just types of people.
Zhou: Another good example, people have talked about using this for recruitment. My thinking about recruitment is that it’s not about rejection. It’s really about better placement. If I go to a company and interview, I’m always wondering, ‘Would I fit into this company?’ Every company has a very different culture. Google has a very different culture from IBM or Facebook. If you can externalize that culture to people, people will understand and say, ‘Maybe I woulnd’t really like to work here.’ I think a very transparent way of doing things will help both the individual and the company.
VB: It seems like you could do a kind of pre-screening. ‘Send me all the people who are viable candidates based on their personality.’ Then you could know how to talk to those people, know how to engage with them in a certain way.
Zhou: A somewhat extreme example, say I’m running a charity organization. I’m going to ask people to apply for a particular job in a particular cause. How weird is it if this person actually didn’t care about this cause? You’d want someone who had high levels of self-transcendence, someone interested in helping others. Someone more interested in optimizing themselves, I might not reject them outright, but I’d have to say, ‘These are the people in my organization. This is what motivates them. If your motivation isn’t in line with theirs, I don’t think you’d be happy here either.’
VentureBeat: For us, at VentureBeat, we stand on that line between journalism and blogging, where on one side you promote an opinion and on the other you don’t. If you’re in the middle, that’s not bad. But if you tilt too far to either extreme, it doesn’t really work for us. That’s probably how we would make use of it.
Zhou: In traditional social networking analysis, they talk about influencers. Someone’s that everyone is an influencer in this era now, because everyone has a voice. Certain voices are bigger or smaller, but they’re still voices. We derive what we call the network of potential. People influence people who are like them – similar people, like-minded people. If you look at different celebrities or different ordinary people, you might see someone who has only 200 or 300 Twitter followers versus someone who has as many as two million followers. If you look at the makeup of their network, even though someone might have only 200 followers, those followers could all be very alike. That means their potential influence could be much bigger, as opposed to someone with two million followers who are nothing alike. In what area does a person have influence? That’s important to think about.
VentureBeat: We’re interested in our net influence score as well, because we usually find that while are traffic is not among the highest, the influence that we have reaches into a lot of other publications. So does that kind of analysis fits well with other stuff that you’re doing, about personality and who your friends are and the like?
Zhou: This is very similar to something customers ask us for, about store branding. You want to influence people who haven’t been influenced yet, right? That’s key. Someone like Amazon.com does it in a way that’s very different. They say, ‘You just viewed this. Someone who viewed that also bought this other thing.’ It’s all behavior-driven. In our case, we can see that you just bought something, and even if other people haven’t bought that, I can see who shares similar traits with you. Then I can reach a population for which I don’t have any behavioral data yet.
Likewise, if I know who’s reading your articles, they’re probably very open, very adventurous, wanting to see something new on the technology side. But in the meantime, they’re traditional. They don’t want something completely out of the blue sky. Once you know all that, you can go out and find those people and send a recommendation. “You share these attributes with the people who read this. You might want to take a look at it.”
VentureBeat: When you do this, how many different personality types can you identify?
Zhou: In this case, what I showed you is based on 52 attributes, so there are all kinds of combinations. So far we’ve gone through 500,000 people and not found anyone who’s exactly the same. It’s highly unlikely.
VB: One of the things a brand has to do is understand itself, then, to find a match.
Zhou: Yes, I agree. Also, to find what motivates them. You find matches in different ways. At IBM we talk about finding matches with your mentors. IBM is a huge place. When you join IBM, everyone has a different creative ambition. Someone wants to be a businessman. Someone wants to be part of the tech elite. You really want to know the people that are there. The nice thing about it is that if you can show that there’s a diversity of people in a given position, it makes those new people feel better.
I was reading a recent book, The Power of Introverts, which talks about how American culture tends to promote extroverts. If you give this kind of data to your employees, though, it provides a lot of encouragement. People don’t just look at their colleagues who are very vocal, very social. Some people who do very well are very solitary, very thoughtful.
I don’t want to use technology to pigeonhole people into something. That’s the opposite of what we’re trying to do. We want to show that the creative world is made up of people who are very different.
VB: This is in the research stage right now. Do you envision any particular path to commercialization?
Zhou: We’re working with customers to pilot our work in a customer environment. They’re working with our product teams to build out the customer experience. Some customers just want the software as a service, almost like what you said. You could have a customer simply say, “Find the people most likely to be a new audience target.” They don’t want to deal with the machinery or the data. Other companies, ones with a huge IT arm, might take the technology and integrate it with their own software and solutions. We’re trying to figure out how to commercialize this in a way that will satisfy the diverse needs of all our customers.
VentureBeat: I’m thinking of one category—I don’t know that anybody would actually go here, but a suicide alert?
Zhou: No, absolutely. We’re talking to the government from a couple of sides. One is about cyber-bullying. What’s unique about our work is that you get to know more about the deeper sides of a person, but more important, it’s not just about just gathering that understanding. It’s about acting on it. We have something here that we call emotional style, emotional resilience, how well someone can bounce back from adverse situations. Also, your overall emotional outlook. Before bullying happens, a person might have a very positive emotional outlook, but afterward, it’s always very negative. There can be depression and fear. Suicide has come up in relation to the military, especially with people in very high-stress situations. You want to detect changes in a person’s emotional style and intervene.
That intervention can be shaped because you know what motivates someone. A very simple example might be, with someone who cares about their family a lot – ‘Your family really loves you. You can’t just go on your own. You need to stay with them.’ Based on different motivations, you can find different ways to convince someone to not go down a negative path. We’re working with a number of people in this area.
VentureBeat: There are all those benefits, but again, you run into that issue of privacy, where people don’t want this kind of thing publicized.
Zhou: True. But this approach can do even more to preserve privacy. Say a school counselor can detect the signs of something in advance, they can take action with parents to prevent a more public problem. If you’re trying to see if someone has become very depressed, it’s not like you can say, ‘Hey, come here and take a personality test.’ But if you look at this information, you can tell that you need to pay attention to someone because they may have a problem. It gives you cues to act upon.
VentureBeat: Part of what this might say is that social media hasn’t figured out the distinction between public and private so well. People who get in trouble with these things don’t understand that it’s both public and private.
Zhou: I agree. As scientists, we want things to be transparent. One of the things we say at IBM is that whenever you design a system technology, it needs to be, simple, prescriptive, and open. Open means that other people can come to critique it and see how to improve upon it. Prescriptive is also important – it refers to the transparency of the system. Hiding the system doesn’t help. We need to tell people, “If you want to discover more about yourself, or discover other people who are like you, we offer this benefits, and the risk is that you may be disclosing some private information, although we will protect it.” Essentially, we’ll tell them how their information will be shared and used. I think people will understand and applaud that effort. We certainly can’t implement this in any kind of sneaky way. That doesn’t help.
VentureBeat: This seems like a more efficient analysis than other kinds could be. If you took a billion measurements of one person, you’d find out a lot about them, but it would cost way too much. Analyzing 200 tweets seems pretty inexpensive in terms of processing.
Zhou: We also take advantage of big data, as you implied earlier. If we were sitting here five or six years ago, I wouldn’t be very confident, because I could only analyze a few people. I’d have no way of validating any of that. Now you have literally hundreds of millions of people out there. We keep improving our system. The dictionary of personality for Big Five was a very small one. We extended that greatly, because we’ve seen so many samples come in.
It’s a little like Google Suggest. It’s the power of the cloud. Google has done a great job of creating a spellchecker. You can spell Britney Spears’ name in hundreds of ways and still get hits for Britney Spears. [laughs] This is the same thing. You analyze so many people that you can beef up your technology to make it more and more accurate and cover more and more people. Now that we’ve hopefully covered western culture, we need to move on to eastern culture as well. That’s another adventure.