VentureBeat: There are all those benefits, but again, you run into that issue of privacy, where people don’t want this kind of thing publicized.
Zhou: True. But this approach can do even more to preserve privacy. Say a school counselor can detect the signs of something in advance, they can take action with parents to prevent a more public problem. If you’re trying to see if someone has become very depressed, it’s not like you can say, ‘Hey, come here and take a personality test.’ But if you look at this information, you can tell that you need to pay attention to someone because they may have a problem. It gives you cues to act upon.
VentureBeat: Part of what this might say is that social media hasn’t figured out the distinction between public and private so well. People who get in trouble with these things don’t understand that it’s both public and private.
Zhou: I agree. As scientists, we want things to be transparent. One of the things we say at IBM is that whenever you design a system technology, it needs to be, simple, prescriptive, and open. Open means that other people can come to critique it and see how to improve upon it. Prescriptive is also important – it refers to the transparency of the system. Hiding the system doesn’t help. We need to tell people, “If you want to discover more about yourself, or discover other people who are like you, we offer this benefits, and the risk is that you may be disclosing some private information, although we will protect it.” Essentially, we’ll tell them how their information will be shared and used. I think people will understand and applaud that effort. We certainly can’t implement this in any kind of sneaky way. That doesn’t help.
VentureBeat: This seems like a more efficient analysis than other kinds could be. If you took a billion measurements of one person, you’d find out a lot about them, but it would cost way too much. Analyzing 200 tweets seems pretty inexpensive in terms of processing.
Zhou: We also take advantage of big data, as you implied earlier. If we were sitting here five or six years ago, I wouldn’t be very confident, because I could only analyze a few people. I’d have no way of validating any of that. Now you have literally hundreds of millions of people out there. We keep improving our system. The dictionary of personality for Big Five was a very small one. We extended that greatly, because we’ve seen so many samples come in.
It’s a little like Google Suggest. It’s the power of the cloud. Google has done a great job of creating a spellchecker. You can spell Britney Spears’ name in hundreds of ways and still get hits for Britney Spears. [laughs] This is the same thing. You analyze so many people that you can beef up your technology to make it more and more accurate and cover more and more people. Now that we’ve hopefully covered western culture, we need to move on to eastern culture as well. That’s another adventure.