Simply by looking at these different rates of word usage, Schnoebelen and his colleagues, David Bamman of Carnegie Mellon University and Jacob Eisenstein of Georgia Tech, can predict the gender of an author on Twitter with 88 percent accuracy.
But Schnoebelen, Bamman, and Eisenstein didn’t stop there, even if such a high level of accuracy in pinpointing gender would be good enough for, say, L’Oréal. They wanted to go beyond the standard binary stereotypes of “Men Are from Mars, Women are from Venus” to understand how “male” and “female” linguistic markers actually work in the world, at least online.
They found that even though you can categorize certain words as having a higher male or female probability, it’s easy to find large swaths of Twitter users who go against these trends. By grouping people by their style of usage, they could find, for example, a cluster of authors that is 72 percent male but nonetheless favors the nonstandard spellings that are supposedly a hallmark of “female” language.
Digging deeper, the researchers looked at the social networks that people create on Twitter, making connections by “following” and replying to other users. When you take these networks into account, the gender picture gets even more complex. It turns out that the statistical outliers (men who use language that’s associated with women, and vice versa) are more likely to have networks skewing to the other gender.