This chapter is a draft. The final version of Data Feminism will be published by the MIT Press in 2019. Please email Catherine and/or Lauren for permission to cite this manuscript draft.

In 1970, the Detroit Geographic Expedition and Institute released a provocative map, titled "Where Commuters Run Over Black Children on the Pointes-Downtown Track". The map starkly shows where many Black children were killed. On one single corner alone, there were six children killed by white drivers over the course of six months. Just gathering the data that the community already knew to be true posed a difficult problem. No one was keeping detailed records of these deaths, nor making them publicly available. The only reason it ended up being collected and published was because of an unlikely collaboration formed between low-income, urban, Black youth led by Gwendolyn Warren and white male academic geographers.

<p>Gwendolyn Warren was the Administrative Director of the Detroit Geographic Expedition and Institute, a collaboration between Black youth in Detroit and white academic geographers that lasted from 1968-1971. The group worked together to map aspects of the urban environment related to children and education. Warren also worked to set up a free school where youth could take college classes in geography for credit.</p><p>Credit: The Detroit Geographic Expedition and Institute</p><p>Source: Gwendolyn Warren, “Field Notes III: The Geography of Children" (1970)</p><p>Permission: Pending</p>

Gwendolyn Warren was the Administrative Director of the Detroit Geographic Expedition and Institute, a collaboration between Black youth in Detroit and white academic geographers that lasted from 1968-1971. The group worked together to map aspects of the urban environment related to children and education. Warren also worked to set up a free school where youth could take college classes in geography for credit.

Credit: The Detroit Geographic Expedition and Institute

Source: Gwendolyn Warren, “Field Notes III: The Geography of Children" (1970)

Permission: Pending

Contrast this map with a map made thirty years prior by the (all white and male) Detroit Chamber of Commerce and the (all white and male) Federal Home Loan Bank Board. This map set the stage for "redlining", a discriminatory practice of rating the risk of home loans in particular neighborhoods based on residents' demographics (their race, not their creditworthiness). Redlining began as a visual technique of red shading for all the neighborhoods in a city that were deemed "undesirable" for granting loans. All of Detroit's Black neighborhoods in 1940 fall in red areas on this map. Denying loans to Black residents set the stage for decades of structural racism and blight that was to follow.   

<p>This is a "redlining" map of Detroit published in 1939. Created as a collaboration between the (all white and male) Detroit Chamber of Commerce and the (all white and male) Federal Home Loan Bank Board, the red colors signify neighborhoods that these institutions deemed red neighborhoods "high-risk" for bank loans. Paul Szewczyk, a local historian, has demonstrated how all of Detroit's majority African American neighborhoods were colored red. Detroit was not an isolated case - Redlining was a standard practice in virtually all of America's major cities. It was a scalable, "big data" approach to systematic discrimination under the guise of data and objectivity.</p><p>Credit: The Detroit Chamber of Commerce and the Federal Home Loan Bank Board</p><p>Source:</p><p>Permissions: Pending</p>

This is a "redlining" map of Detroit published in 1939. Created as a collaboration between the (all white and male) Detroit Chamber of Commerce and the (all white and male) Federal Home Loan Bank Board, the red colors signify neighborhoods that these institutions deemed red neighborhoods "high-risk" for bank loans. Paul Szewczyk, a local historian, has demonstrated how all of Detroit's majority African American neighborhoods were colored red. Detroit was not an isolated case - Redlining was a standard practice in virtually all of America's major cities. It was a scalable, "big data" approach to systematic discrimination under the guise of data and objectivity.

Credit: The Detroit Chamber of Commerce and the Federal Home Loan Bank Board


Permissions: Pending

Both of these maps use straightforward cartographic techniques: aerial view, legends and keys, color and shading, to indicate different characteristics. But what is starkly, undeniably different about the two maps are the worldviews of the makers and their communities. In the second map you have the racist, male-dominated city and federal institutions seeking to further institutionalize segregation and secure white wealth. Black neighborhoods were deemed to pose a "high risk" to the financial solvency of white institutions, so redlining maps became a way to systematically and "scientifically" protect white resources. These institutions succeeded, in no small part because of maps like this one. In contrast, in the first map you have a community who had recently learned the cutting-edge geographic techniques of their era who decided to take action against those same structures of power that created the first map. One is a map of securing power and the other is a map contesting power.

Who makes maps and who gets mapped? The DGEI map is, unfortunately, a rare instance in which communities of color, led by a young Black woman, determined what they wanted to map. It is more frequently the case that communities of color are mapped by institutions in power, whose worldviews and value systems may differ vastly from those of the community. One of the most dangerous outcomes of this imbalance of power – in evidence in this example of harm that was inflicted on people systematically for decades using maps and data – is when those institutions in power obscure their political agendas behind a veil of objectivity and technology. 

This veil is not just a historical phenomenon. One can make a direct comparison between yesterday's redlining maps and today's risk assessment algorithms. The latter are used in many locales to inform whether a person who has been detained should be considered at low or high risk of committing a future crime. Risk assessment scores can affect whether a person is let out on bail and what kind of sentence they receive – they have the power to set you free or lighten your sentence. 

The issue is that different bodies are differently weighted by the risk assessment algorithm. For example, in 2016 Julia Angwin led a team at ProPublica to investigate one of the most widely used risk assessment algorithms created by the company Northpointe (now Equivant). Her team found that white defendants were more often mislabeled as low risk than Black defendants, and conversely, that Black defendants were mislabeled as high risk more often than white defendants. Digging further into the details, the journalists uncovered a 137-question worksheet that detainees fill out. Their answers feed into the software and are compared with other data in order to spit out the risk assessment score for the individual. While the questionnaire does not ask directly about race, it asks questions that are direct proxies for race, like whether you were raised by a single mother, whether you have friends or family that have been arrested, and whether you have ever been suspended from school. In the US context, each of those data points has been demonstrated to have disproportionate occurrences for Black people – 67% of Black kids grow up in single parent households, for example, whereas the rate is only 25% for white kids. So, while the algorithm creators claim that it isn't considering race, it is considering race by proxy and using that information to systematically disadvantage Black and brown people.   

<p>Northpointe’s risk assessment algorithm is called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) and is derived from a defendant's answers to a 137-question survey about their upbringing, personality, family and friends, including many questions that can be considered proxies by race, such as whether they were raised by a single mother. Note that evidence of family criminality would not be admissible in a court case for a crime committed by an individual, but here it is used as a factor in making important decisions about their freedom.</p><p>Credit: The Northpointe risk-assessment survey, sourced by ProPublica.</p><p>Source:</p>

Northpointe’s risk assessment algorithm is called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) and is derived from a defendant's answers to a 137-question survey about their upbringing, personality, family and friends, including many questions that can be considered proxies by race, such as whether they were raised by a single mother. Note that evidence of family criminality would not be admissible in a court case for a crime committed by an individual, but here it is used as a factor in making important decisions about their freedom.

Credit: The Northpointe risk-assessment survey, sourced by ProPublica.


The redlining map and the Northpointe risk assessment algorithm have a lot of similarities. Both use the cutting-edge technologies and aggregated data about social groups for institutions to make decisions about individuals – Should we grant a loan to this person? What's the risk that this person will re-offend? Both use past data to predict and constrain future individual behaviors. Note that the past data in question (like segregated housing patterns or single parentage) are a product of structurally unequal conditions amongst social groups, and yet the technology uses those data as a causal element that will influence an individual's future behavior. Effectively this constitutes a demographic penalty that tracks an individual through their lives and limits their future potential – Live in a Black neighborhood? Then you don't get a loan. Raised by a single mom? Then you can't be freed on bail because you are a flight risk. And the kicker is that because of their use of tech and data, both of these racist data products have the appearance of neutrality. Scholar Ruha Benjamin has a term for this – "the New Jim Code" – a situation which combines software code and imagined objectivity to contain and control Black and brown people.

What's the alternative? Let us for a moment imagine a completely different set of values to encode in our data products. The values in evidence in redlining maps and risk assessment algorithms are about preserving a race- and class-based status quo. White, wealthy men working in powerful institutions adopt a focus on risk – a single loan in default threatens to decrease the wealth of their institution and the data and computational systems are mobilized to avoid this possibility at all costs. But instead of penalizing people for their statistical affiliation with specific race, gender and class demographics, we could imagine an alternative approach grounded in equity and demographic healing. A system could mobilize the same data – say, zip code and neighborhood demographics – to determine where more strategic investment was needed to counteract the toxic effects of structural inequality. And when people applied for loans, the red color in certain neighborhoods would indicate their higher need and place them higher up in the priority line for individual loans.

Likewise, risk assessment algorithms would be called "needs assessment algorithms." They would be banned for any kind of punitive application and be used exclusively in order to prioritize which individuals were in need of more services and support in the re-entry process. This is the recommendation made by a coalition of more than 100 civil rights, digital justice, and community-based organizations in 2018 in their jointly-released statement on algorithms, "If pretrial risk assessment instruments are utilized at all, the only purpose they can meaningfully serve is to identify which people can be released immediately and which people are in need of non-punitive or restrictive services."
The values in our alternate world are not about preserving the dominance of certain institutions and elite people but about equalizing the effects of structural inequality. Sharing power and wealth could easily be hardcoded into the computational systems of the future. The data and technology would remain almost the same but the values driving their use (and the people who derive benefit from their use) would be almost exactly opposite. But this alternate world won't happen of its own accord. As Frederick Douglass stated in 1857, and as Yeshimebeit Milner recently reminded Data for Black Lives members: "Power concedes nothing without a demand."

What is the demand? Which demands and on behalf of whom? In order to formulate those demands it is important to do two things: examine how power is currently wielded with and through data and, in parallel, imagine and model how things could be different. Examining intersecting dimensions of power has long been part of a feminist toolkit. Back in 1977, the Combahee River Collective, the famed Black lesbian activist group out of Boston, urgently advocated for "the development of integrated analysis and practice based upon the fact that the major systems of oppression are interlocking."

Examining how power is wielded through data means doing projects that wield it back like Warren's map and ProPublica's Machine Bias story – These deal openly and explicitly with who has power and who doesn't, as well as naming the structural conditions like racism and sexism that underlie those facts. It involves lifting the veil of what Benjamin calls the "imagined objectivity" of code and exposing the differential harms and benefits resulting from the deployment of data science. Good work in this vein is emerging from spaces like activism, journalism,

Nick Diakopolous and others have advanced the notion of algorithmic accountability beats. Organizations like ProPublica are demonstrating what these look like in action.
machine learning,
The FATML conference examines fairness, accountability and transparency for machine learning systems. This growing community of technical researchers looks at how to measure bias in data sets, how make visible the workings of machine learning algorithms, and how to align system recommendations with equity and policy goals, among other things.
and law.

But data science and visualization work that examines power still mostly happens around the margins of the field, for three reasons. First, unless you work in an accountability field (such as journalism or law), there typically isn't funding or other professional incentives for such work. Corporations typically want to visualize their supply chain, not their sexism.

Second, the people that have access to data and to the technical skills to work with it are those that have the most stake in reproducing the status quo. The elephant in the server room, only very occasionally acknowledged, goes back to one of the issues that we raised in Bring Back the Bodies: that women and people of color are not well-represented in the fields of data science and visualization, and the problem is getting worse. In the graphic below, you can see that female graduates in Computer/Information Science in the US peaked in the mid 1980's at 37%. We have seen a slow decline in the years since then. The rate of female graduates in 2010-11 fell below the rate of female graduates in 1974-5. What this means is that the most highly-touted methods of producing knowledge and deriving insight in the age of big data and artificial intelligence are being designed and deployed primarily by the people with the most privilege.

<p>One would expect female graduates in Computer/Information Science to be around 50%, but actual rates have never come close to half. Female graduates in the US peaked in the mid 1980's at 37% and we have seen a slow decline in the years since then. The rate of female graduates in 2010-11 (17.6%) is now below the rate of female graduates in 1974-5. </p><p>Credit: Graphic by Catherine D’Ignazio</p><p>Source: Catherine D’Ignazio</p>

One would expect female graduates in Computer/Information Science to be around 50%, but actual rates have never come close to half. Female graduates in the US peaked in the mid 1980's at 37% and we have seen a slow decline in the years since then. The rate of female graduates in 2010-11 (17.6%) is now below the rate of female graduates in 1974-5.

Credit: Graphic by Catherine D’Ignazio

Source: Catherine D’Ignazio

<p>[ DRAFT IMAGE: A redesigned version of a pie chart included in the AAUW report, "COMPUTING WORKFORCE, BY GENDER AND RACE/ETHNICITY, 2006–2010" which juxtaposes the statistics on women in computing with women in The overall population ]</p><p>Credit: Graphic by Catherine D’Ignazio</p><p>Source: Catherine D’Ignazio</p>

[ DRAFT IMAGE: A redesigned version of a pie chart included in the AAUW report, "COMPUTING WORKFORCE, BY GENDER AND RACE/ETHNICITY, 2006–2010" which juxtaposes the statistics on women in computing with women in The overall population ]

Credit: Graphic by Catherine D’Ignazio

Source: Catherine D’Ignazio

Relatedly, the third and final reason that examining power is the exception rather than the norm is that, as feminist sociologist Michael Kimmel says, "privilege is blind to those who have it."

What does this mean? If you remember Kimmel's colleague's powerful statement from Chapter X, it went like this. His African-American colleague said, "When I look in the mirror I see a Black woman. When a white woman looks in the mirror she sees a woman." And Kimmel, a white man, rejoins, "And when I look in the mirror, I see a human being." For people in the dominant group, their gender, race, sexuality or class is so normalized that it is invisible. It is not seen as a marker of difference, but rather simply "the way things are". Take enough of those privileged individuals and put them together collectively at the helm of data science and algorithm development and you have a major structural deficiency. This basic imbalance of power remains mostly unacknowledged – except when it reveals itself in surprising and uncomfortable ways.

For example, Joy Buolamwini, a Ghanaian-American graduate student at MIT, was working on a class project using facial analysis technology. These are software packages that will detect a face in an image, similar to when your phone camera will create outlines around the people's faces that it "sees" in the picture. But there was a problem – the software couldn't "see" Buolamwini's dark-skinned face. It had no problem seeing her lighter skinned collaborators. When she drew a face on her hand and put it in front of the camera, it detected that. And then when Buolamwini put on a white mask, essentially going in "white face," the system detected the mask's facial features perfectly. Digging deeper into the code and benchmarking data behind these systems, Buolamwini discovered that the data set on which many of the facial recognition algorithms are tested contains 77.5% male faces and 83.5% white faces. When she did an intersectional breakdown of a separate test dataset – looking at gender and skin type together – only 4.4% of the faces in that data set were female and dark skinned. In their evaluation of three commercial systems, Buolamwini and Timnit Gebru showed that darker-skinned females were up to forty-four times more likely to be misclassified than lighter skinned males. No wonder the software was failing on faces like Buolamwini's if both the training data and the benchmarking data relegate women of color to a tiny fraction of the overall data set.

<p>Joy Buolamwini found that she had to put on a white mask in order for the facial detection program to "see" her face. Buolamwini is now founder of the <a href="">Algorithmic Justice League</a> (AJL).</p><p>Credit: Photo by Joy Buolamwini</p><p>Source: Photo by Joy Buolamwini</p><p>Permissions: Pending</p>

Joy Buolamwini found that she had to put on a white mask in order for the facial detection program to "see" her face. Buolamwini is now founder of the Algorithmic Justice League (AJL).

Credit: Photo by Joy Buolamwini

Source: Photo by Joy Buolamwini

Permissions: Pending

As she tells it, "I didn't start out on a mission for social justice," but after seeing the need for more fairness and accountability, Buolamwini has now gone on to launch the Algorithmic Justice League (AJL) – an organization that works to highlight and address algorithmic bias. Buolamwini and the AJL have done art projects, written research papers, taken to the media to call for a moratorium on facial analysis and policing, and they are even advising on legislation and professional standards for the field of computer vision.

But imagine, for a moment, a world where female, Black and brown engineers are the ones designing the computer vision training data sets and algorithms in the first place. A world where people like Joy Buolamwini and Gwendolyn Warren are the norm, not the exception. Would such a system have worked from the start? Not necessarily, says Buolamwini. "No technologist works in isolation – we rely on the libraries and datasets developed by the community over time." And these datasets have what she has termed "power shadows" – they reflect the structural inequality of the world they draw from. So when it is easiest to collect faces of powerful public figures for your benchmarking data, those datasets will contain power shadows - disproportionate male and white representation.

So what does "working" mean if you want to make data products that are anti-racist and anti-sexist? On the one hand, the software did "work". It was pretty good at detecting faces for the white men who comprised 78% of the data set. But Buolamwini likes to remind her audiences that Europeans are less than 10% of the world's population, so it didn't work for the majority of the global population. And even so, "it's not just about creating accurate algorithms but creating equitable systems," she says. We can't just build more precise surveillance apparatuses; we also need to look at the deployment, governance, use and impacts of these technologies: "Communities, not companies, should determine whether and how this technology is used by law enforcement."

Where we might say that the technology did "work" is that it accurately reflected back to Buolamwini the biases of the people in power towards Black women. In that sense, it faithfully reinforced the racist messages Black people receive all the time that their lives as well as their voices, bodies, and representations do not matter. bell hooks referred to this phenomena as "representational harms." Specifically writing about data, artist Mimi Onuoha has called this phenomenon "algorithmic violence" and data ethicist Anna Lauren Hoffmann has used the term "data violence" for the way in which it participates in (and legitimates) the circulation of damaging narratives and ideas about particular groups of people. This is the harm that occurs with imagined objectivity – when software engineers wield data "neutrally" (in an attempt to wiggle out of having to deal with squishy things like values) they build things that support the existing status quo. And that status quo is ugly – it is racist, patriarchal, heteronormative and more.

In fact, one of the structural forces that software engineers and data scientists need to contend with is that data is by and large a tool of management, wielded by those institutions in power, like the Detroit Chamber of Commerce in the 1940s, who have a vested interest in maintaining the ugly status quo because they benefit from it. Joseph Weizenbaum, artificial intelligence trailblazer and creator of the famous ELIZA experiment in the 1960s, looked back on the history of computing and said it like this: "What the coming of the computer did, 'just in time,' was to make it unnecessary to create social inventions, to change the system in any way. So in that sense, the computer has acted as fundamentally a conservative force, a force which kept power or even solidified power where it already existed."

The first step to pushing back against this fundamentally conservative force is to understand that the single most damaging thing one can do to uphold the oppressive order of the world is to claim that they have no values, no politics, and that their work with data is neutral. This is Haraway's god trick and Benjamin's imagined objectivity - the veil at work to obscure power differentials. This neutrality narrative would be item #1 in the BuzzFeed listicle "Things Straight White Men Tell Themselves to Stay on Top."

The second step is to begin to understand the ways that privilege – and oppression, its counterpoint – manifest themselves in data science. Privilege and oppression are complicated and there are "few pure victims and oppressors," as sociologist Patricia Hill Collins notes. A helpful way to start to grasp these functionings is through Collins' concept of the matrix of domination. As we described at the outset of this book, a core distinguishing feature of contemporary feminism is its insistence on intersectionality – the idea that we must take into account not only gender but also race, class, sexuality and other aspects of identity in order to fully understand and resist how power operates to maintain an unjust status quo. Collins' matrix of domination describes the overall social organization of those intersecting oppressions. She outlines four major domains in which the matrix of domination operates: the structural domain, the disciplinary domain, the hegemonic domain, and the interpersonal domain. "Each domain serves a particular purpose," writes Collins.

The structural domain is that of laws and policies and schools and institutions – it organizes and codifies oppression. If we take the example of voting, most US states prohibited women from voting in elections until the 1910s. Even after the passage of the Nineteenth amendment in 1920, many state voting laws included literacy tests and other ways to specifically exclude women of color,

Other disenfranchisement methods devised by white people to prevent Black women from exercising their right to vote included waiting in line for twelve hours to register to vote, pay a tax, or take a test in which they had to read and interpret the Constitution. In many places they faced threats of bodily harm up until the 1960s, simply for trying to vote. Note that the history of voter suppression perpetrated by white people on people of color is not over. Voter suppression based on race continues today in the form of photo ID laws, dropping people from voter rolls (so they have to re-register), limiting early voting, felon disenfranchisement, and more.
so it wasn't until the Voting Rights Act in 1965 that all Black and brown women were enfranchised. The disciplinary domain administers and manages oppression through bureaucracy and hierarchy (rather than explicit laws). In our voting example, this might take the shape of a company prohibiting factory workers from leaving early to vote or penalizing workers who distribute information about voting.  

Neither of these domains are possible without the hegemonic domain which deals with culture, media, and ideas. Discriminatory policies and practices in voting can only happen in a world that widely circulates oppressive ideas about who "counts" as a citizen. For example, an anti-suffrage pamphlet from the 1910s proclaimed that "You do not need a ballot to clean out your sink spout." This and other such memes of the era reinforced pre-existing societal notions that a woman's place is in the domestic arena, outside of public life. And the final part of the matrix of domination is the interpersonal domain, which influences the everyday lived experience of individuals. For example, what would it feel like to be the butt of jokes made by males in your family as they read that pamphlet? How did it feel like to wait in line for twelve hours to cast your vote, knowing that the system was deliberately trying to screw you out of a voice?

If you are a Black woman in the US, you are intimately familiar with the matrix of domination because you brush up against it in everyday encounters. Writes Collins, "Oppression is not simply understood in the mind—it is felt in the body in myriad ways. Moreover, because oppression is constantly changing, different aspects of an individual U.S. Black woman’s self-definitions intermingle and become more salient: Her gender may be more prominent when she becomes a mother, her race when she searches for housing, her social class when she applies for credit, her sexual orientation when she is walking with her lover, and her citizenship status when she applies for a job. In all of these contexts, her position in relation to and within intersecting oppressions shifts." In each of these cases, the woman is made aware of her differences and her subjugated position in relation to a dominant norm. This experience is an essential form of data – lived experience as primary source knowledge.

But let's imagine for a moment you are a straight, white, middle-class, cisgender male U.S. citizen. Your body doesn't change in childbirth and breastfeeding so you don't think about workplace accommodations. You look for a home or apply for a credit card and people are eager for your business. People smile or don't look twice when you hold your girlfriend's hand in public. You present your social security number in jobs as a formality, but it never hinders an application from being processed or brings unwanted attention. The ease with which you traverse the world is invisible to you because it is quite simply the way things are and you imagine they are the same for everyone else. This is what it means to be blind to your own privilege – despite having the best education, the most elite among us are pathetically deficient when it comes to recognizing injustice, across all of the domains in the matrix of domination. They lack the lived experience – the undeniable data of lived experience  – that reminds them everyday that their bodies, their sexuality, and/or their race depart from a desired norm.

Projects that reveal those norms often focus on the absences and silences – those who are purposefully omitted or simply forgotten because of who has consolidated privilege and power. We’ve already introduced you to the work of artist and designer Mimi Onuoha in Chapter One. Her project, Missing Data Sets, if you recall, is a list she maintains of issues and events that go uncounted. Her missing data sets name important phenomena that you would expect institutions to collect systematic information about topics such as police killings, hate crimes, sexual harassment, and caucasian children adopted by people of color.

At the time that she created it, no authority was collecting information on these topics or, as in the case of hate crimes reported to the FBI, the data was extremely ill-reported and poor quality. Since 2015, there are now two on-going databases of police killings that are being compiled by the Guardian and the Washington Post, respectively. ProPublica has amassed a database of hate crimes.

<p>Missing Data Sets, by Mimi Onuoha, 2015 - present, is a list of data sets that are not collected because of bias, lack of social and political will, and structural disregard.</p><p>Credit: Photo by Mimi Onuoha</p><p>Source: Mimi Onuoha</p><p>Permissions: Pending</p>

Missing Data Sets, by Mimi Onuoha, 2015 - present, is a list of data sets that are not collected because of bias, lack of social and political will, and structural disregard.

Credit: Photo by Mimi Onuoha

Source: Mimi Onuoha

Permissions: Pending

Onuoha exhibits Missing Data Sets as an empty set of tabbed file folders in art exhibitions. The viewer can browse the files and open the folders to reveal that there are no papers inside. What should be there, in the form of paper records, is "missing" – absent not because the topics are unimportant, but because of bias, social and political will, and structural disregard. As Onuoha says, "That which we ignore reveals more than what we give our attention to. It’s in these things that we find cultural and colloquial hints of what is deemed important. Spots that we've left blank reveal our hidden social biases and indifferences." 

What is to be done about missing data sets? Taking a feminist perspective in this unequal ecosystem can mean pointing at their absence, as in the case of Onuoha. Or, sometimes, it means walking right straight ahead into the unequal playing field and collecting the missing data yourself, because somebody has to do it. 

This is exactly what pioneering data journalist and civil rights advocate Ida B. Wells did as early as 1895, when she assembled a set of statistics on the epidemic of lynching that was sweeping the United States at the time; or what Princesa, the anonymous Mexican woman who we introduced in Bring Back the Bodies, has been doing for the past three years. She has logged 2,355 cases of femicide since 2016,

2,355 cases as of this writing in 2018.
and her work provides the most accessible information on the subject for journalists, activists and victims' families seeking justice.

Femicide is a term first used publicly by feminist writer and activist Diana Russell in 1976 while testifying before the first International Tribunal on Crimes Against Women. Her goal was to situate the murders of women in a context of unequal gender relations. In this context, men use violence to systematically dominate and exert power over women. Indeed, the research bears this out. While male victims of homicide are more likely to have been killed by strangers, a 2008 report notes a “universal finding in all regions” that women and femmes are far more likely to have been murdered by someone they know. Femicide includes a range of gender-related crimes, including intimate and interpersonal violence, political violence, gang activity, and female infanticide.While such deaths are often depicted as isolated incidents, and treated as such by authorities, those who study femicides characterize them as a pattern of underrecognized and under-addressed systemic violence.

Femicides in Mexico rose to global visibility in the mid-2000's with widespread media coverage about the deaths of poor and working-class women in Ciudad Juárez. A border town, located across the Río Grande from El Paso, Juárez is a home to more than 300 maquiladoras – factories that employ many women to assemble goods and electronics, often for low wages and in substandard working conditions. Between 1993 - 2005, nearly four hundred women were murdered in the city, with around a third in brutal or sexual form. A conviction was made in only three of those deaths. When alleged perpetrators were arrested, they were often tortured into confessions by police, casting doubt on the investigations. Activist groups like Ni Una Más (Not One More) and Nuestras Hijas de Regreso a Casa (Our Daughters Back Home) were formed in large part by mothers who demanded justice for their daughters, often at great personal risk to themselves.

Indeed, Marisela Escobedo Ortiz, the mother of one such victim, was herself shot at point blank range while demonstrating in front of the Governor's Palace in Chihuahua in 2010.
These groups succeeded in gaining the attention of the Mexican State who established a Special Commission on Femicide chaired by politician Marcela Lagarde. After three years of investigating, the Commission found in 2006 that femicide was indeed occurring and that the Mexican State was systematically failing to protect women and girls from being killed. Moreover, Lagarde suggested that femicide be considered, "a crime of the state which tolerates the murders of women and neither vigorously investigates the crimes nor holds the killers accountable.”

Despite the Commission's work and the fourteen volumes of detailed accounts and statistics about femicide, as well as a 2009 ruling against the Mexican state by the Inter-American Human Rights Court; As well as a United Nations Symposium on Femicide in 2012; As well as the fact that sixteen Latin American countries have now passed laws defining femicide – despite all this, deaths in Juárez have continued to rise and the toll is now more than 1500. Three hundred women were killed in Juárez in 2011 alone, and only a tiny fraction of those cases have been investigated. The problem extends beyond Ciudad Juárez in the state of Chihuahua to other states in the nation such as Chiapas and Veracruz.

While there is increasingly a legal and analytical basis for characterizing deaths as femicides, there is still a great deal of missing data. In a report titled Strengthening Understanding of Femicide, the authors state that "instances of missing, incorrect, or incomplete data mean that femicide is significantly underreported in every region." In the case of femicides, as in so many cases of data collected (or not) about women and marginalized groups, the collection environment is compromised. Lagarde's very definition of femicide includes the fact that the State – comprised mainly of privileged men who have a vested interest in maintaining a gendered order –  is complicit through indifference and impunity, so how could data be reliably collected?

This circles us back to a point we first made in Chapter One, and elaborated in Unicorns, Ninjas, Janitors, and Rock Stars, about how collecting large amounts of data is costly and resource-intensive. Only government states, corporations, and some elite institutions have those resources, so data collection efforts tend to be driven by their values and priorities. Not surprisingly, those institutional actors can be compromised by their own privilege, and their interest in maintaining the status quo. In the case where a government state is itself the bad actor, there can be no other authority with enough resources or channels of influence to shift collection practices. This is especially true in the case of femicides, in which collecting high-quality data would rely on shifting policy for local law enforcement and medical examiners, the entities that log homicide information.

But as data journalist Jonathan Stray asserts, "Quantification is representation." Looking at U.S. census data prior to 1970, he explains, you might come to the conclusion that there were no Latinx people living in the United States. This is not true, of course. There were actually already millions of Latinx people living in the U.S. But 1970 was the first year that “Hispanic” was included as an ethnic category on the census. Prior to that, it would have been hard to know anything about Latinx people as a group because the federal government was simply not collecting any information about them. So when the category was added to the census, most Latinx people were pleased to see it. It meant that they mattered.

But the inverse of the “quantification is representation” equation is also true: if data is not collected on a particular group, or on a particular issue, then institutions in power can pretend that the issue doesn't exist. Similar to the case of universities and sexual assault statistics, as discussed in The Numbers Don't Speak for Themselves, no Mexican state wants to have high rates of femicide. It is into this lack of government will that Princesa, who recently has spoken out in public under her given name María Salguero, has inserted her map of femicides. Salguero studied Geophysical Engineering in Mexico's Instituto Politécnico Nacional. She learned her mapping and journalism skills from attending trainings with Chicas Poderosas, a Latin American feminist group that focuses on training cis and trans women in data storytelling. The femicides map takes two forms – one, depicted in Figure 7.08a, is a point map where Salguero manually plots a pin for every femicide that she collects through media reports or through crowdsourced contributions. The other visualization, seen in figure 7.08b, consists of the same data in a dashboard format, with gender-related killings grouped as smaller or larger bubbles for different geographies depending on their incidence. One of her goals is to "show that these victims had a name and that they had a life. They weren't statistics," so Salguero logs as many details as she can about each death. These include name, age, relationship with the perpetrator, mode and place of death, whether the victim identified as transgender, as well as the full content of the news report which served as the source. It can take her three to four hours a day to do this unpaid work (see Show Your Work for a further discussion on labor, gender, and data). She takes breaks for preserving her mental health, and she typically has a backlog of of a month's worth of femicides to add to the map.

<p>María Salguero's map of femicides in Mexico 2016-present. Map extent along with a detail of Ciudad Juárez with a focus on a single report of an anonymous transgender femicide. She crowdsources points on the map based on reports in the press and reports from citizens to her. </p><p>Credit: María Salguero. </p><p>Source: and;ll=21.347609098250942%2C-102.05467709375&amp;z=5. </p>

María Salguero's map of femicides in Mexico 2016-present. Map extent along with a detail of Ciudad Juárez with a focus on a single report of an anonymous transgender femicide. She crowdsources points on the map based on reports in the press and reports from citizens to her.

Credit: María Salguero.

Source: and

While media reports and crowdsourcing are imperfect ways of collecting information, this map – created and maintained by an individual – fills a vacuum created by the government's deflection of responsibility. Mexico's National Health Information System (SINAIS) logs national homicide data, but only records the name, location, and how the person died, and to count a death as a femicide you must know the circumstances of the death as well as the relationship between the perpetrator and the victim. Various federal agencies point fingers in different directions regarding femicide data collection. In 2017, the Federal Institute for Access to Public Information and Data Protection (INAI) – led by Commissioner Ximena Puente de la Mora – ordered the National Commission for Human Rights (CNDH) to turn over statistics about femicides for 2015 and 2016. The CNDH declared itself unable to provide such information and referred the request to two other federal agencies, neither of whom collect data about femicides. 

In the meantime, Salguero's femicides map provides the most authoritative source of data on femicides at the national level. It has been featured in national Mexican media outlets and used to help find missing people. Salguero herself has testified before the Mexican Senate. Though Salguero is not affiliated with a specific group, she makes the data available to activist groups for their efforts. And parents of victims have called her to give their thanks for making their daughters visible. The urgency of the problem makes the labor worthwhile. Princesa affirms, "this map seeks to make visible the sites where they are killing us, to find patterns, to bolster arguments about the problem, to georeference aid, to promote prevention and try to avoid femicides." 

How might we explain the missing data around femicides in relation to the four domains of power that constitute Collins' matrix of domination? The most grave and urgent manifestation is in the interpersonal domain, where women are victims of extreme violence and murder at the hands of men. And although the structural domain – law and policy – has recognized femicide, there are no specific policies implemented in order to ensure adequate information collection, either by federal agencies or local authorities. Thus, the disciplinary domain, where law and policy are enacted, is characterized by deferral of responsibility, failure to investigate, and victim blaming, precisely because there are no consequences in the structural domain.

And none of this would be possible without the hegemonic domain - the realm of media and culture – that presents men as dominant and women as subservient; men as public, women as private; with any challenge to this gendered order of operations perceived as a grave transgression, deserving of punishment. Indeed, government agencies have used their position to publicly blame victims. Following the femicide of 22-year-old Mexican student Lesvy Osorio in 2017, , as Maria Rodriguez-Dominguez reports, the Public Prosecutor's Office of Mexico City shared on social media that the victim was an alcoholic and drug user who had been living out of wedlock with her boyfriend. Here was the office that was supposed to be investigating the murder, and instead of doing their job they turned to social media to imply that Osorio was a degenerate. This led to public backlash and the hashtag "#SiMeMatan (If they kill me)" and tweets such as "#SiMeMatan it’s because I liked to go out at night and drink a lot of beer."

This is the data collection environment for femicide information and it is characterized by extremely asymmetrical power relations, where those with power and privilege are the only ones who can actually collect the data but they have overwhelming incentives to ignore the problem, precisely because addressing it poses a threat to their dominance. Here it is important to note that data on femicides is not an isolated case. It is an expected outcome and regular feature of an unequal society, in which a gendered, racialized order is maintained through willful disregard, deferral of responsibility and organized neglect for data and statistics about those bodies who do not hold power. For example, doctoral student Annita Lucchesi has created "The Missing and Murdered Indigenous Women Database" which tracks indigenous women who are killed or disappear under suspicious circumstances in the US and Canada. She thinks approximately 300 indigenous women per year are killed but the exact number is unknown because nobody (other than Lucchesi) is actually counting. Other examples in the US context include police killings of unarmed Black and brown people,

The FBI has announced an initiative – called the National Use-of-Force Data Collection – to collect federal data about police shootings, and its first year of data collection was slated to be 2018. At the time of this writing, there was no data available for download and the most comprehensive statistics remain those collected by the Washington Post and the Guardian.
maternal mortality statistics, and people killed by US drones.

What is to be done? It's important to remember that asymmetrical power relations don't mean absolute power. And it's also important to remember that States and entities with power are not monolithic. There are plenty of public servants – women and men and others – in Mexico advocating internally for better data collection around femicides, like Ximena Puente de la Mora from INAI who initiated the femicides data request.

Crowdsourced data collection efforts that count and measure the extent of structural oppression can be a first step towards demanding public accountability. This is an important, urgent role for data journalism in the 21st century.

An excellent report called "Changing What Counts" by Jonathan Gray, Danny Lämmerhirt, and Liliana Bounegru goes into detail about global case studies where crowdsourced data collection efforts are leading to measurable institutional change:
As we discussed in Bring Back the Bodies, ProPublica has an on-going investigative series about "Lost Mothers" – mothers in the US who lose their lives in childbirth due to poor care and preventable causes. One of their findings was that there was no comprehensive federal data on maternal mortality, so ProPublica began crowdsourcing stories of individuals to attempt to count the phenomenon. Their database and their reporting has spurred the creation of more than 35 state level review committees who are investigating maternal mortality in their state, as well as a proposed bill in Congress to allocate $12.5 million to the Centers for Disease Control and Prevention to undertake better data collection.

But, at the same time, we also have to work on dismantling the consolidated power and privilege that organize the matrix of domination.

Could we statistically model oppression? It's a provocative question and one that Google researcher Margaret Mitchell has been investigating at the level of collective human speech. She describes how, in speech patterns, people use unqualified nouns for the "default" case of something. For example, bananas that are green are modified with "green bananas" or "unripe bananas" to indicate that they depart from the ready-to-eat yellow banana. But nobody needs to say "yellow banana" because it is implied by our shared concept of banana. This is called "reporting bias" in artificial intelligence research. So, studying the adjectives that modify "banana" in large data sets can actually tell us a lot about what people's default idea of bananas is in a particular culture. And when applied to humans, the "default case" reveals a lot about our collective norms and biases. For example, a doctor who is female is more typically qualified as a "female doctor" in human speech because it represents a departure from a perceived norm of doctors being male. So if "female doctor" is used in speech patterns for a particular culture, we might be able to infer that the social norms for that culture are patriarchal and thus pay special attention to the ways in which women are oppressed. Of course, this only works with those ideas that make it into human speech. As we have already outlined, there are many important issues related to cis and trans women, such as sexual assault, about which people are almost completely silent.

Or perhaps we need to start looking at privilege as an ethical and legal liability and start quantifying it. Anti-racist feminists have long opposed quantifying privilege at the scale of the individual body (which can lead to something Roxane Gay calls "the oppression olympics" - competition for who is most oppressed). However, building off of recent calls for monitoring Big Tech with things like Sasha Costanza-Chock's "Intersectional Media Equity Index," one could fairly easily quantify the collective privilege of an organization and then create a prediction score for just how likely that institution is to create racist, sexist data products that lead to harmful impacts for users as well as legal and public relations disasters for the firm. Such a score could incorporate demographic information for firm ownership, leadership, employees (with a special focus on the demographics of those who are producers of data products for the company) and users. It could consist of a grade from 0-100, where 0 signifies perfect alignment between the firm and its users and 100 signifies a high risk of discrimination because of misalignment between the firm and its users. This privilege hazard score would measure just how much or how little the firm was influenced by those who already have the most privilege and power, and conversely, just how likely it would be to produce discriminatory "mistakes" and oversights. Consequently, the media might be less surprised when Google, whose board consists of 82% white men, creates image classification algorithms that only show white men in image searches for "CEO". Or when the Mexican State, comprised of X% rich men, is complicit in the murders of its working class women and girls. Such discriminatory outputs would have been entirely expected based on their privilege hazard score. As discussed in What Gets Counted Counts, there is an explicit politics of being counted here. Quantification can operate as a kind of sousveillance - "watching from below" – where the Great Quantifiers like Google and Amazon and even whole nation-states are quantified and predicted right back. 

But let us return to the Frederick Douglass quote, "Power concedes nothing without demand." So far, we have discussed the feminist project of examining power – interrogating how power works through data to prioritize some bodies over other bodies and to secure the wealth and status of the dominant group. Buolamwini's algorithmic auditing quantified exactly how much facial recognition software was failing women of color. Princesa's map exposes the fact that gender-based killings are rampant and going untracked by the powers that be. Many important efforts to redress data discrimination and algorithmic bias are working in this mode of examining power. But in order to truly formulate demands, a feminist approach additionally requires imagining and modeling power differently to achieve equity. Equity is justice of a specific flavor, and it is slightly different than equality. Fighting for a world which treats everyone equally means that those who start out with more privilege will get further, achieve more and stay on top. Fighting for a world which treats everyone equitably means taking into account present power differentials and distributing resources accordingly. More simply said, equality upholds patriarchy and white supremacy. Equity dismantles them.

So how might data be used not only to examine power but also to transform gendered power relations? To support self-determination of marginalized groups? What does a society that values data and equity look like and feel like?

Let's circle back to present day Detroit. At the end of 2017, the Detroit Digital Justice Coalition and the Detroit Community Technology Project published a collaborative report entitled  Recommendations for Equitable Open Data. It was the result of two years of research, conversations and explorations about the city's Open Data Portal. The report is specific about what equitable open data is and who it benefits: "[W]e mean accountable, ethical uses of public information for social good that actively resist the criminalization and surveillance of low income communities, people of color and other targeted communities." Note here how the authors named and made explicit whose perspectives they were centering and why – these communities have been historical targets of discriminatory institutional practices. We saw this targeting explicitly in the redlining map introduced earlier in the chapter. The report goes on to outline seven recommendations for the City of Detroit to adopt to make their open data practices more equitable and more likely to benefit people of color and low-income communities. These include "Protect the people represented by the numbers","Engage residents offline about open data" and "Prioritize the release of new datasets based on community interest." These are concrete demands, offered to improve the use and benefits of open data for the people who are most often left out of open data conversations.

So, following Collins, there is a matrix of domination with four different domains of power. Examining that power using data-driven methods is an important step towards challenging that matrix, particularly in egregious cases like femicides where there is a violent, unjust status quo. Additionally, we have a responsibility to create space for women, people of color, queer and trans folks and others to imagine and dream power differently – to model better and beautiful futures where all can thrive – something which we will address further in Teach Data Like an Intersectional Feminist! But it's hard to see the contours of the matrix of domination, let alone empower others or imagine things differently, when you are the recipient of a lot of benefits from it. When the system works for you, you are able to set racism and sexism and other oppressive forces aside and you will experience little penalty for such ignorance.

If you want a viscerally enlightening reading about privilege, check out White Privilege: Unpacking the Invisible Knapsack by Peggy McIntosh from 1989. Written in the first-person perspective of a white person in the US, it lists fifty ways that white privilege manifests in everyday life, including "My culture gives me little fear about ignoring the perspectives and powers of people of other races."

So what is to be done when you are in a position of power and privilege? Most people working in data science, visualization, machine learning and statistics have significant privilege and power accumulated through their education and their institutional connections, as well their race, gender and ability. Can you use your power and privilege for "good", even though we have explored how much of a hazard it is for your ability to accurately apprehend the injustice of the world? Emphatically, unequivocally "Yes!", with some caveats and elaborations.  

The feminist grounding for navigating this quandary is called an "ethics of care", which we introduced in Show Your Work. While there are many contemporary discussions about data ethics, most derive from a version of moral reasoning introduced by Immanuel Kant in the 18th century, which prioritizes abstract dilemmas, rules and obligations, and universal application. In these conceptions, the focus is on an individual, independent human actor, and their relationships with others are conceived as contractual, business-like negotiations among equals. It is important to note that Kant based morality on reasoning, believed women to be incapable of reason, and thus concluded that women could never be full moral persons, i.e. were not fully human.

We bring this up so that you might question the wisdom of uncritically pulling 18th century philosophers into 21st century ethics conversations, i.e. so you can drop some knowledge on the next person who sings the praises of the categorical imperative in a machine learning ethics discussion.
This relates back to the "master narrative" we described in On Rational, Scientific, Objective Viewpoints from Mythical, Impossible, Imaginary Standpoints, which valorizes reason and (supposed) impartiality over all other ways of knowing and asserts the superiority of males in that capacity. More recently, technical folk are digging this approach because this kind of blanket ethical logic is easy to code into large systems. But it's important to note that this approach was explicitly designed to exclude half of humanity.

On the other hand, a feminist ethics of care prioritizes responsibilities, issues in context, and, above all else, relationships. Feminist philosopher Alison Jaggar has detailed numerous ways that traditional ethics has failed women. Masculine ethical approaches have systematically showed less concern for women's issues as opposed to men's issues, have devalued ethical quandaries in the "private" realm (the realm of housework, family and children), and have valued traditionally masculine traits like independence, autonomy, universality and impartiality, over traditionally feminine traits like interdependence, community, relationships, and responsibility. While the central unit of analysis in Kantian ethics is the individual human, an ethics of care focuses on the relationship between two or more things (possibly human, possibly not), and the ways that they are bound together by that relationship. Which is to say that rather than valuing impartiality, an ethics of care prioritizes intimacy and honors the deep, emotional, personal investment that comes with being responsible for the well-being of another, whether that is a child or the environment. And vice versa – the ways that your own well-being is tied up in how a child or the environment cares back towards you and nurtures you. This kind of situated ethics is not as easy to encode into large computational systems, but we shouldn't rule it out as impossible until someone has actually tried.

What does a feminist ethics of care mean for those of us who work everyday with data science, journalism or visualization and enjoy some relatively high degree of privilege? First, accept that your privilege and power are not just an asset, but also a liability. They structure what you and your institutions see in the world and also what (and who) you and your institutions disregard about the world. The antidote to your privilege deficiency is to establish meaningful, authentic, on-going relationships across power differentials (whether based on gender, race, class, technical knowledge, ability, etc) – and to listen deeply to those new friends. This sounds simple, but it is hard, both at the individual level and at the institutional level, because it involves a reorganization of priorities and revaluation of the metrics of success.

Relationships in an ethics of care are a two-way street. For this reason, it's also important to reframe "doing good" with data as something more akin to "doing equity" or "doing co-liberation" with data to remove some of its paternalistic overtones. All too often, well meaning "help" is conceived as saving unfortunate victims from their own technological ignorance. In presenting the origin story of the Detroit Geographic Expedition and Institute, Gwendolyn Warren reflected on the ignorance of the white male academics her community worked with, "We had this group of geographers, one of whom lived in the neighborhood, who decided that they were going to 'discover us'. They were going to go and explore the 'hood and discover us. And show us how to make change[...] There was no way in hell they were going to save us, but they didn't know it."

Whereas an act of data service performed by a technical organization for a community-based group is often framed as charity, an ethics of care would frame it as one step in deeper relationship building and broader demographic healing. There is a famous saying from aboriginal activists that goes like this,

If you have come here to help me, you are wasting your time. But if you have come because your liberation is bound up with mine, then let us work together. 

Following a logic of co-liberation leads to different metrics of success. The success of a single project would not only rest on whether the database was organized according to spec or whether the algorithm was able to classify things properly, but also on how much trust was built between institutions and communities, how effectively those with power and resources shared their power and resources, how much learning happened in both directions, how much the people and organizations were transformed in the process, and how much inspiration for future work, together, was co-conspired.

Likewise, data projects undertaken by technical folk with an ethics of care would openly acknowledge and account for power differentials by explicitly prioritizing whose voices matter most in the process as input. We saw this in the Detroit Equitable Open Data Report – the authors prioritized the needs of communities that are targeted for surveillance – those who stand to experience the least benefits and the most harm from open data. By prioritizing the needs of those at the margins, we create a system that works for everyone. In some situations, this means working your absolute hardest to establish authentic relationships that did not previously exist. For example, for the past five years, Catherine has been co-leading a feminist hackathon project called Make the Breast Pump Not Suck. The first version of the hackathon took place in 2014 and focused primarily on the product design and experience of using a breast pump.


A breast pump is a device used to extract milk when a breastfeeding mom is separated from her baby or cannot/does not want to nurse them at the breast. Despite the fact that the medical establishment sees breastfeeding as a public health issue, it is socially stigmatized and has faced neglect as a space of innovation. Lack of paid family leave policy in the US means that nursing mothers (and trans dads) often end up back at work secretly pumping in closets, bathrooms, and cars, if they are able to pump at all.

But after a couple years, it was clear that the innovations emerging in the breast pump space were primarily for white knowledge workers – the smart pumps were coming in at $400, $500 and $1000, not covered by insurance and thus only accessible to those with disposable income. So, in organizing the second Make the Breast Pump Not Suck Hackathon in 2018, our leadership team decided to center the voices of mothers of color, low wage workers, and queer parents because those are the groups that face the most barriers to breastfeeding in the US context. We invited members of those groups as hackers – and we also put into place an Advisory Board composed primarily of high-profile advocates of color that work directly with community organizations. This Board caught multiple oversights of the majority white leadership team, and shifted the project in significant ways. Everyone was paid for their time. In On Rational, Scientific, Objective Viewpoints from Mythical, Impossibly, Imaginary Standpoints, we discussed "design from the margins" as an underlying principle of feminist human computer interaction. This additional layer might be characterized as "governance from the margins." It functioned as an accountability mechanism to simultaneously check the leadership team's privilege and prevent us from doing harm, and also to deepen emerging relationships across race, social capital and technical knowledge.

But for this to work, those that are doing co-liberation with data have to trust that the people who experience the most harms from a social issue have the best ideas for reimagining it. As Kimberly Seals Allers, one of our Advisory Board members said at her keynote, "whatever the question, the answer is in the community." And while the emphasis of data projects is often to develop a time-bounded thing – a database, an algorithm, a model, a visualization  – it's important to remember that the longer-term goal is to build meaningful, authentic, on-going relationships across differences in power and privilege in order to transform yourself and your institution and the world.