How might the history of AI help us think about and critique Wang and Kosinsky’s “gaydar” study?

Note: The following is a re-post of AFOG member Shreeharsh Kelkar’s September 25, 2017 post on Scatterplot responding to the controversy over Wang and Kosinski’s (2018) paper about using deep neural networks to recognize “gay” or “straight” faces. At the time of Shreeharsh’s post, Wang and Kosinski’s paper had been accepted for publication but not yet published. The final version of the paper is now published in the Journal of Personality and Social Psychology and can be found here. Shreeharsh argues in this post that at least a part of the opacity of algorithms comes from the ways in which their technical mechanisms and social meanings co-exist side-by-side.

On this blog, and elsewhere, Greggor Mattson, Phil Cohen, and many others have written thoughtful, principled critiques of the recent “gaydar” study by Yilun Wang and Michal Kosinski (henceforth I’ll refer to the authors as just Kosinsky since he seems to be the primary spokesperson). I fully agree with them: the study both does too much and too little. It purports to “advance our understanding of the origins of sexual orientation and the limits of human perception” (!) through a paltry analysis of 35,326 images (and responses to these images by anonymous humans on Amazon Mechanical Turk). And it aims to vaguely warn us about rapacious corporations using machine learning programs to surreptitiously identify sexual orientation but the warning seems almost like an afterthought: if the authors were really serious about this warning, they could have dug deeper with a feasibility study rather than sliding quickly into thinking about the biological underpinnings of sexuality.

As someone who follows and studies the history of artificial intelligence (as I do), there are some striking parallels between the argument between Kosinsky and his critics, and early controversies over AI in the 1960s-80s, and I will also argue, some lessons to be learnt. Early AI was premised on the notion that when human beings did putatively “intelligent” things, they were processing information, a sort of “plan” that was worked out in their heads and then executed. When philosopher Hubert Dreyfus wrote his famous “Alchemy and Artificial Intelligence” paper for the RAND Corporation in 1965 (later expanded into his book What Computers Can’t Do), he drew on the work of post-foundationalist philosophers like Heidegger, Wittgenstein, and Merleau-Ponty to argue that human action could not be reduced to rule-following or information processing, and once AI systems were taken out of their toy “micro-worlds,” they would fail. For their part, AI researchers argued that critics like Dreyfus moved the “intelligence” goalposts when it suited them. When programs worked (as did the chess and checkers-playing programs in the 1960s and 1970s), the particular tasks they performed were just moved out of the realm of intelligence.

Figure 1: The canon of artificial intelligence. Source: Flickr, Creative Commons License.

One way to understand this debate—the way that participants often talked right past each other—is to understand the different contexts in which the AI researchers and their critics approached what they did. In what I have found to be one of the best descriptions of what it means to do technical work, Phil Agre, who worked both as an AI researcher and a social scientist, points out that AI researchers rarely care about ideas by themselves. Rather, an idea is only important if it can be built into a technical mechanism, i.e. if it can be formalized either in mathematics or in machinery. Agre calls this the “work ethic”:

Computer people believe only what they can build, and this policy imposes a strong intellectual conservatism on the field. Intellectual trends might run in all directions at any speed, but computationalists mistrust anything unless they can nail down all four corners of it; they would, by and large, rather get it precise and wrong than vague and right. They often disagree about how much precision is required, and what kind of precision, but they require ideas that can be assimilated to computational demonstrations that actually get built. This is sometimes called the work ethic: it has to work (p13).

But the “work ethic” is often not something outsiders—and especially outside researchers—get. To them, the exercise seems intellectually shoddy and perhaps even dangerous. Here is Agre again:

To get anything nailed down in enough detail to run on a computer requires considerable effort; in particular, it requires that one make all manner of arbitrary commitments on issues that may be tangential to the current focus of theoretical interest. It is no wonder, then, that AI work can seem outrageous to people whose training has instilled different priorities—for example, conceptual coherence, ethnographic adequacy, political relevance, mathematical depth, or experimental support. And indeed it is often totally mysterious to outsiders what canons of progress and good research do govern such a seemingly disheveled enterprise. The answer is that good computational research is an evolving conversation with its own practical reality; a new result gets the pulse of this practical reality by suggesting the outlines of a computational explanation of some aspect of human life. The computationalist’s sense of bumping up against reality itself—of being compelled to some unexpected outcome by the facts of physical readability as they manifest themselves in the lab late at night—is deeply impressive to those who have gotten hold of it. Other details—conceptual, empirical, political, and so forth—can wait. That, at least, is how it feels. [p13, my emphasis].

Figure 2: Courses required to complete a graduate certificate in artificial intelligence. Source: Flickr, Creative Commons License.

This logic of technical work manifests itself even more strangely for something like AI, a field that is about building “intelligent” technical mechanisms, which therefore has to perform a delicate two-step between the “social” and the “technical” domains—but which is nevertheless also a key to its work and its politics. Agre argues that the work of AI researchers can be described as a series of moves done together, a process that he calls “formalization”: taking a metaphor, often in an intentionalist vocabulary, (e.g. “thinking,” “planning”, “problem-solving,”), attaching some mathematics and machinery to it, and then being able to narrate the working of that machinery in intentional vocabulary. This process of formalization has a slightly schizophrenic character: the mechanism is precise in its mathematical form and imprecise in its lay form; but being able to move fluidly between the precise and the imprecise is the key to its power. This is not perhaps very different from the contortions that quantitative social science papers perform to hint at causation without really saying it openly (which Dan has called the correlation-causation two-step on this blog). The struggle in quantitative social science is between a formal definition of causation versus a more narrative one. AI researchers, of course, perform their two-step with fewer caveats because their goal is to realize their mathematical machinery into actual “working” programs, rather than explain a phenomenon.

To switch abruptly to the present, we can see the same two-step at work in the Kosinsky paper. There is the use of social categories (“gay,” “straight”), the precise reduction of these categories to self-labeled photos with faces, the also-precise realization of a feature-set and standard algorithm to derive the labels for these photos, and then the switch back into narrating the workings of the systems in terms of broader social categories (gender, sexuality, grooming, recognizing). The oddest thing in the paper is the reference to the “widely accepted prenatal hormone theory (PHT) of sexual orientation” but a closer reading shows that the theory is invoked essentially to provide a “scientific” justification of choices in the design of what is a conventional machine learning classifier. (My suspicion is that the classifier came first, and the theory came later because of the decision to submit to a psychology journal. Alternatively, it may have evolved out of the peer review process.)

But if the two-step remains the same, the world of AI today is starkly different. As I have written before, today’s artificial intelligence is steeped far more in the art of making (real-world) classifications, rather than in the abstract concepts of planning and state-space searching. Moreover, far from operating in “microworlds” as they did before, contemporary AI programs are all too realizable in the massive infrastructures of Facebook and Google. (Indeed, one of Dreyfus’ criticisms of early AI was that it would not work in the real world. No one would argue that today.) Not surprisingly, the debates over AI have shifted as well: they are much more about questions of bias and discrimination; there’s also far more talk of how “algorithms”—the classifying recipes of the new AI—sometimes seem similar to the discredited sciences of phrenology and physiognomy.

There have been three angles of critique of the Kosinsky study. The first has been over the researchers’ notion of “informed consent”: as Greggor Mattson points out (see also this Mary Gray post on the old Facebook contagion study), researchers, whether corporate or academic, need to be more cognizant of community norms around anonymity and privacy (especially for marginalized communities) when they scrape what they see as “public” data. The second has been from quantitative social scientists who find the Kosinksy study lacking by the standards of rigorous social science. Again, you’ll find no argument from me on that score. But it bears mentioning that AI researchers are not quantitative social scientists: they are not so much interested in explaining phenomena as they are in building technical systems. Should quantitative social scientists take the logic of technical work into account when they criticize the big claims of the Kosinksy study? Maybe so, maybe not; there are certainly grounds to think that the dialogue between quantitative social scientists (accustomed to the correlation-causation two-step) and AI classifier-builders will be productive, given that the use of correlations is now emerging as central to both fields.

My own angle on the study is from the third perspective, that of interpretive social science. When we social scientists find the use of social categories in the Kosinsky study dubious (and even outright wrong), we are reacting to what we see as the irresponsible use of a socially meaningful vocabulary to describe the working of an arcane technical mechanism. On this score though, the history of the older debates over AI is worrying. If my reading of the history of AI is right (I’m open to other interpretations), those debates went nowhere because people were talking past each other. Much ink was spilled, feuds were born, but everything went right on as it did before: AI was still AI, the social sciences were still the social sciences, and the differences remained stark and deep. (Indeed, the work of people like Agre and Lucy Suchman got taken up more in the computer science sub-field of human-computer interaction (HCI) than in AI proper.)

Could we do better this time? I don’t know. I might start by asking the AI researchers to be careful with their use of metaphors and socially meaningful categories. As the AI researcher Drew McDermott put it in his marvelously titled “Artificial Intelligence Meets Natural Stupidity” article written in the 1970s, some of the feuds over early AI really could have been avoided if the AI researchers had used more technical names for their systems rather than “wishful mnemonics.”

Many instructive examples of wishful mnemonics by AI researchers come to mind once you see the point. Remember GPS? (Ernest and Newell 1969). By now, “GPS” is a colorless term denoting a particularly stupid program to solve puzzles. But it originally meant “General Problem Solver,” which caused everybody a lot of needless excitement and distraction. It should have been called LFGNS–“Local-Feature-Guided Network Searcher.”

For our part, we may want to collaborate with AI researchers to think about social categories relationally and historically rather than through an essentialist lens. But successful collaborations require care and at least a sense of the other culture. First, we may want to keep in mind through our collaborations that there is an inner logic to technical work. To put it in Agre’s terms, technical work evolves in conversation with its own practical reality and does not necessarily aim at conceptual coherence. Second, when they do draw on the social sciences, AI researchers tend to look at psychology and economics (and philosophy), rather than, say, sociology, history or anthropology. (And not surprisingly, it is also in psychology and economics that machine learning has been taken up enthusiastically. Kosinsky, for instance, has a PhD in psychology but seems to describe himself as a “data scientist.”) This is not a coincidence: computer science, psychology and economics were all transformed by the cognitive revolution and took up, in various ways, the idea of information processing that was central to that revolution. They are, all of them, in Philip Mirowski’s words, “cyborg sciences” and as such, concepts can travel easier between them. So interpretive social scientists have their work cut out. But even if our effort is doomed to fail, it should be our responsibility to open a dialogue with AI researchers and push for what we might call a non-essentialist understanding of social categories into AI.

Published by on March 21, 2018

1 Comment

Greggor Mattson · March 21, 2018 at 8:47 pm

Leave a Reply Cancel reply

Madeleine Clare Elish

Sava Saheli Singh

Report from the first AFOG Summer Workshop