Michelle Zhou gave me a shock recently when she showed me a graph of my personality, based on a sampling of my tweets. The IBM researcher can make an educated guess about anyone’s personality based on looking at 200 tweets by that person.
The research at IBM’s Almaden Research Center in San Jose, Calif., came from a new Accelerated Discovery Lab aimed at gaining insights from big data. Zhou can do this personality analysis in a disciplined and automated fashion, measuring 52 different personality traits for each subject. This “psycholinguistic” analysis could prove useful for companies that want to understand their customers in a more intimate way. But it also raises privacy issues and other concerns.
We talked those over in a full conversation with Zhou. Here’s our edited transcript of that talk.
VentureBeat: How did you get started with this in the first place?
Michelle Zhou: I’ve always been working in this general area. Before, I was dealing with intelligent user interaction – how you better understand a user’s behavior and a user’s intentions, so you can adjust a computer to adapt to that. Now, we’re thinking past user behavior modeling – clickthroughs, menu movements, window interaction. I was thinking, ‘Can’t we do a little bit more than this?’ I came across this idea of psycholinguistic analysis, and I saw that we could extend that to develop our own technologies and gain a deeper understanding of individuals.
Three or four years ago, we started a series in many people-centric operations, from marketing to customer service to product development to HR. A ubiquitous challenge there is that they want to gain an understanding of people as individuals. Understanding an individual has to go beyond behavior analysis. Amazon.com is a quintessential example of behavior analysis. You read something or buy something.
Many human studies in psychology, social science, and behavior economics show that people’s behavior and their decision-making processes in the real world are heavily influenced by what’s called their intrinsic traits. Those intrinsic traits include what motivates you, what you believe, your fundamental needs. Thinking about it, it’s very hard to imagine – in a traditional way – how you could learn someone’s intrinsic traits, aside from standard psychometric tests. You can’t ask a customer something like, ‘Welcome to my store, would you like to take a personality test?’ Let alone scale out to literally hundreds of millions of customers in the real world.
VentureBeat: Are we talking about the personalities of companies or individual consumers at this point?
Zhou: Like I say, it’s any people-centric operation. You want to understand the people involved in a process. It could be the employees of a company. For example, somebody wants to advance or develop their talent. You want to understand what this person can do. Also, for a consumer company – retail, consumer products, services like airlines or hospitality – they want to understand their customers as individuals so they can serve them as individuals.
The challenge here is, how do we gain this understanding of individuals and their intrinsic traits at scale? Then we found two things that could work to our advantage. Part of it is psycholinguistic analysis. Computers can derive people’s traits from linguistic footprints. That hasn’t been widely applicable before, because where do you get those linguistic footprints? Now, you can do that with social media and digital communications. Those are readily available, so we saw an opportunity there.
Before, psycholinguistics has only been applied to the Big Five personality types – not many other traits like basic human values, the things that describe your beliefs and motivations and needs. Marketing studies and behavioral economics have shown evidence that people who have different needs buy totally different things. Idealistic people often go for organic foods, organic skin care. They buy clothes for their pets. They go for brands that support that.
The research has been done at the University of Colorado and UT Austin. A psychologist developed a dictionary of categories to predict the words people use for certain things. People who are happy or depressed use different word categories. You can use more extensive word categories to try to predict where someone sits in the Big Five. However, that’s not sufficient. For example, no word categories exist to predict values or beliefs.
Zhou: Right. Our method comes from psychometric studies. You design these item-based psychometric studies and you get many people to take them. In the meantime, you also collect their linguistic footprints. You might say, ‘Describe your needs,’ and then we use statistical modeling to correlate the words in their linguistic footprints with their psychometric scores. That statistic model can become a predictive model, with the dictionary we have already, and we can apply it to other people who haven’t taken the tests.
Another thing we want to do, a business can ask their customers to opt in. Using myself as an example –maybe this is a bit extreme – I subscribe to e-mail lists. Sometimes I get something I really think about, but 99 percent of the time I get junk mail that has nothing to do with what I want. In this case, I may be willing to tell a company more about me – this is who I am, this is what I want – and that can help them send me more about what I really want. Or when I book a trip and check out hotels, I might want to know more about what hotels have been chosen by people who are like me. I can look at Yelp, but who are those people? Are they like me at all? Do they like the same things I like? With this, you can learn more based on the behavior of people who are like you. That’s the essence of what we’re working on.
We’re still at the research stage. We’re working with some of our customers to validate our work in their field. The reason we’re very excited is, first of all, IBM gives us an environment to work with many customers that other companies might not have access to. Second, it’s a broad range of work, from psychology to social science to computer science. We need that interdisciplinary background to put this together. Having that real world validation will be great.
VentureBeat: Is this a kind of big data effort? Do you have to look at thousands of people, thousands of messages?
Zhou: We don’t actually need a lot of messages. We’ve done extensive experiments and found that about 200 tweets from a given person – about 2500 or 3000 words – is a good representative sample. That gets us within about 10 percentiles of using thousands of tweets. So that makes a good estimation. The big data aspect, though, is that you have to look at hundreds of millions of people to make this valuable. You have so many types of people, so much data, and you can’t just go one by one.
I took the liberty of searching you on Twitter and created a profile of you—