The Personas Project out of the MIT Media Lab's Sociable Media Group has been making the rounds lately. It uses data-mining and natural language processing (NLP) to gather data about you from the web and distill it into a pretty info. graphic that is supposed to represent 'how the web sees you.' Here's mine:

My Results from the Personas Project

My Results from the Personas Project


(Click for a larger view.)

Like any art project, this one seems intended to make us think about how we're portrayed on the web and about the data that's out there floating around. A project like this is intended to inspire an 'OMG, is this how the web sees me?' reaction. It's meant to shock and awe us by getting things right, and perturb us by getting things wrong. It's one of many that tries to do these sorts of things, and I think it's getting attention right now partly because it comes from MIT and partly because it's very pretty. The authors are due a lot of credit, however, for recognizing the limits of their tool:

[The Personas Project] is meant for the viewer to reflect on our current and future world, where digital histories are as important if not more important than oral histories, and computational methods of condensing our digital traces are opaque and socially ignorant.

I love that last clause. Well put. I think data mining is particularly opaque and socially ignorant when it's employed for abstract purposes. There are lots of really tightly wound, well scoped questions that we can answer with data mining techniques. I use data mining as a tool myself, but I use it to gather evidence of behavior in support of very narrow, specific claims. But as a tool for telling us how we're viewed on the web, for example, they stink. I'd imagine that most researchers know they stink. But we're still using them as a primary tool to talk about social processes, even though social processes require context and time-lines that data mining can't begin to capture. Why is that? Why haven't we seen more backlash against these autistic methods? Why haven't we seen more studies that mix data-mining with qualitative methods, for example, to lend context where it's lacking? I suppose it's because data-mining is easy. Computation and storage are cheap, and talking to people is hard.

I hope that this sort of tool will reveal the danger of these methods, and encourage researchers to advance the state of the field. We've dwelled too long on the digital traces / privacy meme. At this point it's just getting tired and exploitative of the digital paranoia that's rampant in the media right now. We need to get past it.