Skip to main content
SearchLoginLogin or Signup

Chapter Six: Show Your Work

The products of data science are the work of many hands. Unfortunately, though, we tend not to credit the many hands who perform this work. Sometimes, it's because we can't see the people who performed it, but other times, it's because the work itself is invisible to the eye.

Published onNov 05, 2018
Chapter Six: Show Your Work

This chapter is a draft. The final version of Data Feminism will be published by the MIT Press in 2019. Please email Catherine and/or Lauren for permission to cite this manuscript draft.

If you work in software development, chances are that you have a GitHub account. As of June 2018, the online code management platform had over 28 million users worldwide. By allowing users to create web-based repositories of source code (among other forms of content) to which project teams of any size can then contribute, GitHub makes collaborating on a single piece of software, or a website, or even a book, much easier than it’s ever been before.

Well, easier if you're a man. A 2016 study found that female GitHub users were less likely to have their contributions accepted if they identified themselves in their user profiles as women. Critics of GitHub’s commitment to inclusivity (or lack thereof) also point to the company’s internal politics. In 2014, GitHub’s co-founder was forced to resign after allegations of sexual harassment were brought to light. But problematic gender politics do not necessarily preclude other feminist interventions. And here, GitHub makes an important one: the platform helps show the work of writing collaborative code. In addition to basic project management tools, like bug tracking and feature requests, the Github website also generates visualizations of each team member’s contributions to a project’s codebase. Area charts, arranged in small multiples, allow viewers to compare the quantity, frequency, and duration of any particular member’s contributions. A virtual “punch-card” reveals patterns in the time of day when those contributions took place. And a flowchart-like diagram of the relationships between various branches of the project’s code helps to acknowledge any sources for the project that might otherwise go uncredited, as well as any additional projects that might build upon the project’s initial work.

Caption: Two visualizations of the code commits associated with a project from Lauren’s research group, the Digital Humanities Lab, showing the significant contributions of her student researchers.

Credit: Screenshot by Lauren Klein of GItHub data

Source: https://github.com/GeorgiaTechDHLab/speculative/graphs/contributors and https://github.com/GeorgiaTechDHLab/speculative/graphs/punch-card

Coding is work, as anyone who’s ever programmed anything knows well. But it’s not always work that is easy to see. The same is true for collecting, analyzing, and visualizing data. We tend to marvel at the scale and complexity of an interactive visualization like the Ship Map, which we first discussed in Bring Back the Bodies as an example of the view from nowhere. That view, as it turns out, presents the path of every ship in the global merchant fleet over the course of the 2012 calendar year. By plotting every single trip, the Ship Map exposes the network of waterways that constitute our global product supply chain. But we are less often exposed to the network of processes and people that help constitute the visualization itself. From the seventy-five corporate researchers at Clarksons Research UK who assembled and validated the underlying dataset, to the academic research team at University College London’s Energy Institute that developed the data model, to the design team at Kiln that transformed the data model into the visualization that we see-- and that is to say nothing of the tens of thousands of commercial ships that served as the source of data in the first place--visualizations like the Ship Map involve the work of many hands.

Time-based visualization of global shipping routes designed by Kiln based on data from the UCL Energy Institute.

Credit: Website created by Duncan Clark & Robin Houston from Kiln. Data compiled by Julia Schaumeier & Tristan Smith from the UCL EI. The website also includes a soundtrack: Bach’s Goldberg Variations played by Kimiko Ishizaka.

Source: https://www.shipmap.org/

Unfortunately, though, when releasing a visualization to the public, we tend not to credit the many hands who perform this work. We often cite the source of a dataset, and the names of the people who designed and implemented the visualization. But we rarely dig deeper to discover who collected our data, who processed it for use, and who else might have labored to made our visualizations possible. Admittedly, this information is sometimes hard to find. At other times, it can’t be found at all. But the difficulty we encounter when trying to acknowledge this work reflects a larger problem in our data supply chain, as Miriam Posner explains. Like the contents of the ships visualized on the Ship Map, about which we only know vague details-- the map can tell us if a shipping container was loaded onto the boat, but not what the shipping container contains-- the invisible labor involved in data work is something that, Posner argues, we willfully see with “partial sight.” 

To put it more simply, it’s not a coincidence that much of the work that goes into designing a data visualization remains invisible and uncredited. In our capitalist society, we tend to value labor that we can see. When, in the early 1970s, the International Feminist Collective launched the Wages for Housework campaign, it was this phenomenon of invisible labor that they were trying bring to light. By demanding wages for housework, the group was attempting to erase the distinction between the paid labor of traditional jobs, like office or factory work, and the unpaid labor of household tasks, like cooking or cleaning or child-rearing. Housework might be invisible, these women insisted, performed out of sight and away from the marketplace, but it’s certainly not without value. On the contrary, the invisible labor performed inside the home is precisely what enables those who work outside the home to continue to do so.

A Wages for Housework march, 1977.

Source: https://hollisarchives.lib.harvard.edu/repositories/8/archival_objects/1438878

Credit: Schlesinger Library, Radcliffe Institute / Bettye Lane

Permissions: Pending

Unlike washing dishes, however, data work doesn’t require that you get your hands wet. (Unless, of course, you’re a citizen scientist associated with Public Lab and you’re actually collecting data on water). But invisible labor is what sustains the world of data science as well. When was the last time you saw an analysis of census data list the names of any Federal Census Workers, those people outfitted in orange safety vests who knock on your door to remind you to fill out your census form? Or what about the pool of typists who hand-keyed the text of the historical newspapers that you used to train your neural network? Or the metadata librarian who created the fields for the collections database that you visualized? Or the archivists (or, more likely, student employees) who entered all of the actual records into those fields? This work is not always performed for free, nor is it always performed by women. But we can still view it as invisible labor for the way that it remains invisible to the public eye, and uncredited in the end result.

When looking at the various forms of invisible labor that characterize our present moment, information studies scholars tend to focus on the forms of labor that are not only uncredited, but also unpaid. Visit WagesforFacebook.com and you’ll find a version of the Wages for Housework argument, updated for the present. “They call it sharing. We call it stealing,” is one of the lines that scrolls down the screen in large black type. The “it” refers to a form of invisible labor that most of us perform every day, in the form of our Facebook likes, Instagram posts, and Twitter tweets. We might do it because it’s fun, and we might not expect to be paid for it, but the point made by Laurel Ptak, the artist behind Wages for Facebook, which is the same made by theorists of digital labor, most notably by Tiziana Terranova, is that the invisible unpaid labor of our likes and tweets is precisely what enables the Facebooks and Twitters of the world to profit and thrive.

The world of data science is able to profit and thrive because of unpaid invisible labor as well. How did Netflix improve their movie recommendation algorithm? They crowdsourced it. How did the Guardian, the British newspaper, determine which among two million leaked documents might contain incriminating information about government misspending? They crowdsourced it. The error correction performed on the dataset of early modern books that you downloaded for your text analysis project? That was crowdsourced, too.

“But crowdsourcing is fun,” its proponents might say. “People wouldn’t do it otherwise!” (And in the case of Netflix, they’d be quick to point out that the winning team was paid a million dollar prize). But someone like Ashe Dryden, the software developer behind Programming Diversity, would point out that people can only help crowdsource if they have the inclination and the time. Think back to the example of GitHub. If you were a woman, and you knew your contributions to a programming project were less likely to be accepted than if you were a man, would that motivate you to contribute the project? Or, for another example, Wikipedia. While the exact demographics of Wikipedia contributors are unknown, numerous surveys have indicated that those who contribute content to the crowdsourced encyclopedia are between 84% and 91.5% male. Why? It could be that there, too, edits are less likely to be accepted if they come from female editors. It could also go back to the housework argument. A 2011 study showed that women spend more than twice as much time on household tasks than men do, even when controlling for women who hold full-time jobs. Women simply don’t have as much time.

No one would argue with the fact that time is money, but it’s important to remember to ask whose time is being spent, and whose money is being saved. The premise behind Amazon’s Mechanical Turk, or MTurk, as the crowd-sourcing platform is more commonly known, is that data scientists want to save their own time, and their own bottom line. The MTurk website touts its access to a “global marketplace” of “on-demand Workers,” who are advertised as being more “scalable and cost-effective” than the “time consuming [and] expensive” process of hiring actual employees. But the data entry and data processing tasks performed by these workers earn them less than minimum wage, even as a recent study by the Pew Research Center showed that 51% of U.S.-based Turkers, as they are known, hold college degrees; and 88% are below the age of 50, among other metrics that would otherwise rank them among the most desired demographic for salaried employees.  

This underwaged work, as feminist labor theorists would call it, is also increasingly outsourced to countries with fewer (or worse) labor laws, and fewer (or worse) opportunities for economic advancement. A 2010 University of California-Irvine study measured a 20% drop in the number of U.S.-based Turkers over the eighteen months that it monitored. This trend has continued, the real-time MTurk Tracker shows, with workers from India alone now comprising roughly 20% of the total MTurk workforce. (The gender split, interestingly, has evened out over time).

But even in the United States-- and even at companies like Amazon and Google-- the work of data entry is profoundly undervalued in proportion to the knowledge it helps to create. Andrew Norman Wilson’s 2011 documentary, Workers Leaving the Googleplex, exposes how the workers tasked with scanning the books for the Google Books database are hired as a separate but unequal class of employee, with ID cards that restrict their access to most of the Google campus, and that prevent them from enjoying the company’s famed employee perks. (Evidently, working overtime to preserve the world’s cultural heritage still does not entitle you to a free lunch, let alone a free class on how to cook Pad Kee Mao.)

Andrew Norman Wilson’s “Workers Leaving the Googleplex” (2011) documents the hidden inequities at Google’s Mountain View headquarters.

Credit: Andrew Norman Wilson

Source: http://www.andrewnormanwilson.com/WorkersGoogleplex.html

Wilson also observes that Google’s book-scanning workers are disproportionately women and people of color-- a fact that would not surprise the long line of women of color scholar-activists, including Angela Davis, Patricia Hill Collins, and Evelyn Nakano Glenn, who have insisted that economic oppression be recognized as a vector that cuts across the matrix of domination as a whole. (See The Power Chapter where we talk more about this idea). Information studies scholar Lilly Irani confirms that “today’s hierarchy of data labor echoes older gendered, classed, and raced technology hierarchies.” Here, Irani is referring to the underwaged contributions of the first generation of female computers like Christine Darden, who we met in this book’s introduction, who had to resort to NASA’s Equal Opportunity Office in order to receive her long overdue raise; or, for another example, the below-minimum-wage pay of the Navajo women who, in the early days of digital computing, were tapped to assemble integrated circuits for the largest electronics supplier in the country, Fairchild Semiconductor--a story that Lisa Nakamura has recently exposed.

Irani’s own research focuses on Mechanical Turk, the people it employs, and the people it exploits. As part of this work, Irani built a web tool, the Turkopticon, which enables Turkers to anonymously report unfair labor conditions, as well as any additional information that might help them decide whether to accept any future task.

But the people who perform this “cultural data work,” as Irani terms it, are not only found at Amazon; they’re increasingly the people on whom the entire information economy depends. Cultural data workers are responsible for everything from transcribing audio clips to fine-tuning search algorithms. Even Google relies upon people to confirm the quality of its search results, as official job postings for “Ads Quality Raters” confirm. Among more specific skills like a college degree and “excellent written communications skills,” the job ad specifies that “a deep understanding of the culture is required.” 

Cultural data workers are also responsible for the invisible labor involved in moderating the veritable deluge of content produced online every day, ensuring that your Facebook feed is free of dick pics--and, much more disturbingly, videos of beheadings. When a recent exposé in Wired Magazine documented the emotional costs of this labor, performed by some of the least empowered of these workers--women in the Global South--it was met with an outpouring of shock and outrage. But those who study global capitalism for a living would be quick to point out that this exploitation of racialized labor, as they’d term it, has a long and sordid history, one that has its roots in the original form of human exploitation: slavery.

There is an infamous story that is often told in order to illustrate the close connection between capitalism and slavery: in 1781, the British slave ship, Zong, made a series of navigational errors while crossing the Atlantic, resulting in a shortage of drinking water for the 17 crew members and 133 captives on board. After performing a cost-benefit analysis, the crew decided to throw their enslaved human “cargo” overboard, calculating that they could collect enough insurance money on that loss of life to come out ahead. For scholars such as Ian Baucom and, more recently, Fred Moten and Stefano Harney, there is no clearer example of the all-too-easy exchange between people and profit that capitalism enabled then, and still enables today.  

Not by coincidence does our present global technological infrastructure follow this same pattern of exploitation, as Miriam Posner, along with Robert Meija and Safiya Noble, among others, have shown. The cobalt required to produce the lithium-ion batteries that power our cell phones and laptops, for example, may no longer be mined by people in physical shackles, but its extraction is still associated with significant human rights violations, including coercing labor from Congolese children as young as seven. The unregulated disposal of this and other minerals, as well as the electronics that house them, have resulted in entire cities along the west coast of Africa, as well as in China, becoming toxic “e-waste” sites. The humanitarian and ecological stakes couldn’t be higher, nor could their source be any more clear: the capitalist forces that encourage the exploitation of Black bodies so that white bodies can thrive.

The weight of these forces can seem overwhelming. But, as any activist would remind you, any global resistance must begin at home. It follows, then, that we can each start by working harder to acknowledge the range of people who have contributed to our own projects, as well as those whose labor we might inadvertently exploit.

How, more specifically, can we go about this? For one model, we might look to an existing subfield of information visualization, known as data provenance. Data provenance typically refers to the practice of visualizing the history of the changes to a dataset that take place over the course of a project. Because the practice derives, ironically, from supply chain management, most extant data provenance visualizations focus solely on the data itself. They document which fields from which databases were combined with which data from which API, and which technical processes were performed on those data. But just as the ships on the Ship Map are piloted by people--people making real-world decisions like whether to steer around pirates in Somalia; or whether to wait on line for the Suez Canal--we might improve upon existing data provenance diagrams to include human processes as well. Who first collected the data, or processed it for use? What was the workflow employed by the design team, and how could it be plotted in relation to the data provenance chart? Was there a point at which the data analysis phase shifted from exploration to confirmation; or were there other significant conceptual shifts that could be rendered visible? These are only some of the questions that might be answered in such a diagram so that everyone involved in the data analysis process could receive credit for their work, and any dependencies--both intellectual and interpersonal--could be acknowledged. 

A data provenance chart.

Credit: Dlineage. Screenshot by Catherine D’Ignazio.

Source: dlineage.com




One level up, at the level of labor itself, we might take an additional cue from the Next System Project, a research group aimed at documenting and visualizing alternative economic systems. In one report, the group compiled information on the diversity of community economies operating in locations as far-ranging as Negros Island, in the Philippines, Quebec province, in Canada, and the state of Kerala, in India. Their report employs the visual metaphor of an iceberg, in which “wage labor” is positioned at the tip of the iceberg, floating above the water, while dozens of other forms of labor– “informal lending,” “consumer cooperatives,” and work “within families,” among others– are positioned below the water, providing essential economic ballast, but remaining out of sight.

The Next System Project, “Cultivating Community Economies”

Credit: J.K. Gibson-Graham, Jenny Cameron, Kelly Dombrowski, Stephen Healy, and Ethan Miller for the Next System Project

Source: https://thenextsystem.org/cultivating-community-economies

Permissions: Pending

With the idea of underwater labor in mind, we might return to the example of GitHub, which begins this chapter, in order to ask what additional forms of labor might contribute to the production of code, but that cannot be represented by the visualization scheme that GitHub currently employs. We might also think of the work of the project manager, which is not directly expressed in a particular number, or size, or frequency, of contributions, but nevertheless ensures the quality and consistency of all project code. We might wonder about the work of the designer on a project, or of the technical writer– both of whom might have helped to shape the project in its initial phases, but who have likely moved on to other tasks. We might additionally consider the contributions of the user experience specialist, or the quality assurance tester, who might enter the development process at a later phase of the project, but whose work is no less essential to the project’s ultimate success. In the case of a consumer-facing project, we might also consider the contributions of the sales or customer support teams. These forms of labor, both productive and reproductive, are of course essential to the success of the project, but are not currently rendered visible, nor could they ever be easily visualized, by a scheme that considers project contributions to consist of code alone.

When designing data products from feminist perspectives, we must aspire to show the work involved in the entire lifecycle of the project, even if it can be difficult to do. Whether it be a team of software developers working on GitHub, a team of visualization designers at Kiln, or a group of sugarcane farmers in the Philippines, a feminist approach would insist on recognizing the range of communities that produce the data. It would include the people who then collect, digitize, and transform it into a dataset; those who are subsequently enlisted to process the dataset; those then work to analyze and/or visualize the dataset; and finally, those who interpret the images or interactions that are produced, or otherwise experience their effects. Each of these roles is essential to the process of producing knowledge, but relies upon a variety of forms of labor– some visible, some not– in order to take place.

Showing all of this work is a tall order, and as designers and data analysts ourselves, we’ll be the first to admit that it’s not always one that can be fully achieved. But showing the work, as this chapter is named, begins with a commitment to acknowledging the range of forms of work that have been performed, even if they can’t be ascribed to a specific person or credited by a single name.

In more instances than you might think, this work can be surfaced from the data themselves. For instance, Benjamin Schmidt, whose research centers on the role of government agencies in shaping public knowledge, decided to visualize the metadata associated with the digital catalog of U.S. Library of Congress, the largest library of the world. Schmidt’s initial goal was to understand the collection and the classification system that structured the catalog. But in the process of visualizing the catalog records, he discovered something else: a record of the labor of the cataloguers themselves. When he plotted the year that each book’s record was created against the year that the book was published, he saw some unusual patterns in the image: shaded vertical lines, step-like structures, and dark vertical bands that didn’t match up with what one might otherwise assume would be a basic two-step process of 1) acquire a book; and 2) enter it in.   

A visualization of when books at the Library of Congress entered their digital catalog.

Credit: Benjamin M. Schmidt

Source: http://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html

Permissions: pending

The shaded vertical lines, Schmidt soon realized, showed the point at which the cataloguers began to turn back to the books that had been published before the library went digital, filling in the online catalogue with older books. The step-like patterns indicated the periods of time, later in the process, when the cataloguers returned to specific subcollections of the library, entering in the data for the entire set of books in a short period of time. And the horizontal lines? Well, given that they appear only in the years 1800 and 1900, Schmidt inferred that they indicated missing publication information, as best practices for library cataloguing dictate that the first year of the century be entered when the exact publication date is unknown.

With an emphasis on showing the work, these visual artifacts should also prompt us to consider just how much physical work was involved in converting the library’s paper records to digital form. The darker areas of the chart don’t just indicate a larger number of books entered into the catalog, after all. They also indicate the people who typed in all of those records--millions and millions of them. (Schmidt estimates the total number of records at ten million and growing). Similarly, the step-like formations don’t just indicate a higher volume of data entry. They indicate strategic decisions made by library staff to return to specific parts of the collection, and reflect those staff members’ prior knowledge of the gaps that needed to be filled. In other words, their intellectual labor as well.

There is also a political dimension of this work. For instance, in 1996, we see a dark vertical line that Schmidt tell us indicates “an especially furious year of digitizing older records.” Could it possibly be that, in the lead-up to the presidential election that would result in Bill Clinton’s second term, the federal government granted additional funding to the Library of Congress that enabled them to hire additional staff? Or did it indicate the fear that the Republican candidate, Bob Dole, would win the election and reduce the amount of funding for federal agencies, leading to the existing cataloguers to redouble their efforts? We can’t know the answer without additional research, but these questions help to show how the dataset always points back to the data setting--a term coined by Yanni Loukissas, which we introduce in Chapter Four--and to the people who labored in that setting in order to produce the data that we see.

The people who labor in office buildings, and on top of them, are the subject of Builders of the Vision, by Daniel Cardoso Llach. That project employs data collection, as well as visualization, in order to analyze the hidden social hierarchies at work in international construction projects. For Builders of the Vision, Cardoso Llach wrote a script to register the design conflicts between different versions of the plans for the Thomas Wynne Mall, which recently opened along a stretch of otherwise desolate road in Abu Dhabi. In any large building project, design conflicts are common; they emerge when different subcontracting teams--say, the architects and the mechanical engineers--employ different software to draw up their plans for the project. It then becomes the task of a project coordinator to translate those plans into a single file format, identifying and resolving any inconsistencies along the way. Cardoso Llach’s code sat on top of the project management software, recording the date and time of each conflict, the subcontracting teams involved, and the team whose design was accepted. By visualizing this conflict history, he was able to expose the evolving work patterns and shifting power dynamics within the project as it reached completion. In so doing, Cardoso Llach also exposes the hidden complexity of computational labor today.  

A series of visualizations documenting design conflicts within an architectural project’s Building Information Management (BIM) software.

Credit: Daniel Cardoso Llach

Source: Builders of the Vision (Routledge, 2015), p. 132 and 133.

Permissions: Pending

Of course, there is also labor that remains hidden because we are not trained to think of it as labor at all. This is what’s known as emotional labor, and it’s (yet) another form of work that feminist theory has helped to bring to light. As described by feminist sociologist Arlie Hochschild, emotional labor describes the work involved in managing one’s feelings, or someone else’s, in response to the demands of society or a particular job. Hochschild coined the term in the early 1980s to describe the labor required of service industry workers, such as flight attendants, who are required to manage their own fear while also calming passengers, during adverse flight conditions. In the decades that followed, the notion of emotional labor was supplemented by a related concept, affective labor, so that the work of projecting a feeling (the definition of emotion) could be distinguished from the work of experiencing that feeling (the definition of affect).

We can see both emotional and affective labor at work all across the technology industry today. Consider, for instance, how call center workers and other technical support specialists must exert a combination of affective and emotional labor, as well as technical expertise, in order to absorb the rage of irate customers (affective labor), reflect back their sympathy (emotional labor), and then help them with--for instance--the configuration of their wireless router (technical expertise). In corporate headquarters, we might also consider the affective labor required by women and underrepresented groups of all kinds, in all situations, who must take steps to disprove (or simply ignore) the sexist, racists, or otherist assumptions they face– about their technical ability, or about anything else. And they must do so while also performing the emotional labor that ensures that they do not threaten those who hold those assumptions, who often also hold positions of power over them. Are there ways to visualize these forms of labor, giving visual presence–and therefore acknowledgement and credit--to these outlays of work?     

One example to prompt our thinking can be found in the Atlas of Caregiving, an ongoing project aimed at documenting the work involved in caring for a chronically ill family member. The project’s name plays on the concept of the anatomy atlas, a compendium of illustrations of the human body that doctors can consult for information and reference. In this case, the goal was to illustrate the sometimes physical, and sometimes emotional or affective work of care. The research team outfitted its participants with a variety of biometric sensors, including accelerometers and heart-rate monitors, as well as with body cameras programmed to take a picture every fifteen minutes. They then visualized these data alongside excerpts from personal interviews, as well as from the activity logs they asked the caregivers in the study to complete. The result is a complex picture of caregiving, one that marshalls data in the interest of creating a comprehensive view of the range of labor involved in caregiving work.

 

Caption: Clockwise from top left: A 36 hour log of caregiving activities; caregiving activities separate by type; photo log during that same time.

Credit: The Atlas of Caregiving

Source: https://atlasofcaregiving.com/studies/chantals-household/chantal/24-hour/

Permissions: Pending

Of course, a measure like heart rate or skin temperature is only a proxy for human feelings, and this is a common critique of the Quantified Self movement overall. This understanding served as the genesis for “Bruises: The Data We Don’t See.” This artful visualization, created by visualization designer Giogia Lupi, and accompanied by a musical score composed by Kaki King, attempts to make visible the emotional toll of parenting a child with chronic illness, as King herself was required to do when her own child was diagnosed with a rare autoimmune disease. Her daughter’s illness, Idiopathic Thrombocytopenic Purpura, or ITP, is described as a “very visual disease,” and presents as bruises and burst blood vessels all over the body. For this reason, King was instructed to watch her daughter’s skin and record any significant changes. She also thought to record her own feelings in terms of hope, stress, and fear, creating subjective data to complement the hard numbers she received from the blood tests her daughter was required to endure.

Detail from “Bruises: The Data We Don’t See”

Credit: Giorgia Lupi and Kaki King

Source: https://medium.com/@giorgialupi/bruises-the-data-we-dont-see-1fdec00d0036

Permissions: Pending

 When Lupi, who knew King from previous collaborations, set out to design her visualization, her goal was to “evoke empathy,” and make her audience “feel a part of a story of a human’s life.” In contrast to the Atlas of Caregiving, which relies upon standard visualization techniques like radial timelines and Gantt-style charts in order to legitimate the work of care, Lupi sought alternative visualization strategies that would evoke the emotions she sought. She employed a fluid timeline to reflect the subjective nature of what feminist disability studies scholar Alison Kafer calls “crip time.” Days become white aspen-shaped leaves, segmented not by weeks or years but by hospital visits. Red dots indicate platelet counts, with color deployed mimetically in order to convey the intensity of the bruises, as well as the visuality of the “data” recorded by King. Lupi also employed color to represent King’s record of her feelings, with black corresponding to stress and fear; and yellow to signify hope. King’s fear and hope are also visualized by hand-drawn lines that reflect each on a scale of one to ten. The result is both visually and aurally affecting composition of the affective labor of mothering and care. 

Lupi and King, or the Atlas of Caregivers, are not the first to want to identify and make visible the work of care. Since the mid-1990s, when Nancy Folbre introduced the term, care work has been a significant topic of interest for feminist scholars. Folbre’s primary model of care work was the everyday work of caring for a child. But we might also think of additional burden of caring for a sick child, as documented in “Bruises,” or for a family member, as in the “Atlas of Caregivers.” Care work isn’t necessarily performed for free. It can also include the underwaged work performed by daycare workers or home health aids, as well as the waged work of doctors, nurses, physical therapists, mental health professionals, and so on. What binds these forms of work together across economic lines is their motivation: as theorized by Folbre, care work is undertaken out of a sense of compassion with, or responsibility for others, rather than with a goal of monetary gain. But when it comes to the market, altruism is a double-edged sword. These same professional care-workers--who are predominantly women and people of color--are often paid less than they would be in other fields. Why? Because they care.  

A similar commitment to others--and, increasingly, an awareness of how care can also oppress--is what has prompted groups like The Maintainers to take up theories and practices of care in relation to data work. Through a series of workshops, conferences, and publications, The Maintainers are trying to counter the current tendency to celebrate technological innovation and discovery. The work that should be celebrated, they argue, is the work that sustains and maintains the world we live in today; and not work that passes over the problems of the present in order to look ahead. But there is yet more work to be done. Data work, after all, is part of a larger ecology of knowledge. Like the network of ships visualized on the Ship Map, or the network of source code stored on GitHub, the network of people who contribute to data projects is vast and complex. We may never be able to acknowledge all of the work that goes into these projects, at least not explicitly. But thinking through the many of forms of labor involved in data science, by making them visible, and by giving them credit, might be the most important data work of all.

Comments
48
?
Nikki Stevens:

The Drupal community has implemented community contribution credits (following the lead of my old group) to account for non-code community work.

?
Nikki Stevens:

A first step on the infrastructure is to credit users for attending meetings -https://www.drupal.org/project/drupal/issues/2976614

?
Nikki Stevens:

did this study include same-sex or otherwise non-heterotypical couples?

?
Nikki Stevens:

here again, we’re speaking only in binaries.

?
Nikki Stevens:

Ashe has done years (nearly a decade at this point) of OSS community work and seeing her identified as “software developer” feels reductive and disrespectful. (though she might have opted for this label if you checked with her, in which case nevermind)

?
Nikki Stevens:

here again, folks outside the gender binary are excluded from the data.

?
Nikki Stevens:

this is misleading. collaborating is equally easy for all genders. getting PRs accepted into OSS projects is statistically more likely for men.

Elizabeth Losh:

Hard to read this data, so you might need to work with the art editorial team to make it more legible.

Elizabeth Losh:

In the introduction to the Affect Theory reader the editor Melissa Gregg does a nice job differentiating affect from emotion — she actually wants to argue against centering individual subjectivity

Elizabeth Losh:

Good point about recursiveness and reflection.

Elizabeth Losh:

Good articulation of different kinds of data workers. What about those who choose one dataset among others? Or those who hide datasets from public view?

Elizabeth Losh:

This is a nice transition. Perhaps you could think about more ways to connect the sometimes disconnected examples not only in this chapter but in some of the other chapters.

Elizabeth Losh:

It might be helpful for the reader to have a map of all of the kinds of data feminism that you will be highlighting, not only in this chapter, but also in other chapters.

Elizabeth Losh:

In addition to the sources cited by others, there is also the documentary The Cleaners

Elizabeth Losh:

Lilly Irani at UCI — who you reference later in the chapter — has obviously done some interesting platform design work with Turkopticon to address the power disparities in Mechanical Turk to allow employees to review employers and not just the other way around.

Elizabeth Losh:

I agree with Bethany that work from India to address the inequities of participation in Wikipedia is helpful.

?
Heather Krause:

Consider including the people who provided the data as part of the data labour cycle. This is often overlooked in the world of international development - where women of color are regularly completing surveys that take several hours over the course of several days.

Lee Vinsel:

Is it worth pointing more explicitly here or elsewhere in this chapter to the extensive Marxist-feminist literature on “reproductive labor”? I find it a really helpful way to think about the kinds of work you’re emphasizing. It’s implicit in your discussion above about the collective, but might be made more explicit.

https://en.wikipedia.org/wiki/Reproductive_labor

Lee Vinsel:

Not that this is important for your argument, but these kinds of systems might be useful in lots of other areas as well: for instance, politicians and government officials often receive more attention for new systems and ribbon-cuttings than for keeping things going and for quality of service of existing systems. But perhaps this is partly because it’s harder to “see” uptime/quality of service. If we had systems to visualize such things, could we encourage leaders to take better care of public infrastructures?

Lee Vinsel:

I like to use this Wired article on coding as future blue-collar labor when I give Maintainers talks: https://www.wired.com/2017/02/programming-is-the-new-blue-collar-job/

If coding/digital work follows the trends of other technology-centered industries from the last 150 years - and there’s no reason to believe it won’t - it will significantly decrease in status over time. Thus, these low status gigs you’re pointing to may be more like the future norm.

Lee Vinsel:

I wonder how this fits into more general considerations of role hierarchies/prestige at organizations like Google. It reminds me of the plight of cafeteria workers (and likely also janitors) at Facebook and other SV firms. https://www.mercurynews.com/2017/07/24/hundreds-of-facebook-cafeteria-workers-join-union/

It’s another instance, perhaps, where the distinction between digital and non-digital labor is less important than other older status hierarchies.

Lee Vinsel:

I recently listened to a cheesy episode of the podcast EconTalk about gratitude (http://www.econtalk.org/a-j-jacobs-on-thanks-a-thousand/), which highlighted how hundreds and thousands of individuals are involved in each product that I pick up from a shelf at Target. Nearly all of that labor is anonymous and impossible to reconstruct.

In such a context, we might flip it around and emphasize that *public recognition* is the exception and ask under what circumstances it occurs. We could also think about individuals who *perform* their labor or at least around their labor to draw attention to it.

Lee Vinsel:

Was “invisible labor” the term used by the International Feminist Collective? If so, you may want to highlight this fact.

Andy Russell and I have been curious about the current focus on “invisibility” and its intellectual genealogy. In STS, folks tend to point to Susan Leigh Star.

But in your case, I think it would be helpful to make clear whether “invisible labor” is an actor’s category (used by the collective) or an analyst’s category that you are bringing to bear.

Lee Vinsel:

I often wonder how much of this is about cognitive resources and time. Most people I know, including folks who release visualizations, are really strapped for time and energy just doing their everyday routines. Studies again and again how cognitively limited humans are. (John Levi Martin’s “Life’s a Beach but You’re an Ant” is a helpful sociological synthesis of this line of thinking.)

For instance, it’s not so much that infrastructure is truly “invisible” to me - plenty of it is in plain sight - it’s just that I and everyone else are busy paying attention to other things.

Part of what I’m getting at is what is “unfortunate” about the phenomena you’re highlighting.

Bethany Nowviskie:

A small group of scholars, archivists, and folks from the nonprofit sector have been working together under the “Maintainers” banner on a forthcoming open-access statement about the practice of “information maintenance” and its relation to the ethics of care — along with an invitation to join a community around this theme. The phrase I’ve highlighted here reminds me to say that one thing we are emphasizing is that any celebration of “sustaining and maintaining the world we live in today” is not anti-progress or anti-discovery. The Maintainers philosophy isn’t about maintaining oppressive systems or sustaining the patterns of the present uncritically. Something about our new “info-maintainers” community should be out in the first months of the new year, and so much that you have covered here will be of value to the group!

Bethany Nowviskie:

Maybe not your project to take on, but it would at least be worth checking in with Ben to see if he heard any response to his work that might explain that blip — or, if not, flagging this for folks at LC (Trevor Owens, Kate Zwaard, and others) who could perhaps check in with current colleagues and retired staff members before memory of that “furious year of digitizing” fades!

Bethany Nowviskie:

A naive question: are the similarities I see to stemmatic visualizations in bibliography and textual criticism, and to chain-of-custody concerns in both forensic science and fine art/museum artifact provenance just coincidental here?

Bethany Nowviskie:

Steven J. Jackson’s “Rethinking Repair” and subsequent work on mobile phone repair in Bangladesh is probably worth citing here.

Bethany Nowviskie:

Worth mentioning that Turkopticon itself has been volunteer-run and its creators are feeling burn-out? https://twitter.com/turkopticon/status/1053044386374668289

Bethany Nowviskie:

I highly recommend the work of Anasuya Sengupta and her colleagues Siko Bouterse & Adele Vrana at “Whose Knowledge?”, a new nonprofit formed to represent and center the knowledge of marginalized people on the Internet. They’re starting with Wikipedia and the problem of disenfranchised & disrespected editors, including women: https://whoseknowledge.org/. Sengupta is a former Wikimedia Foundation executive with experience working with Dalit and other communities as Wikipedia editors. She recently keynoted the DLF Forum: https://forum2018.diglib.org/livestream-recordings/

Bethany Nowviskie:

If you want to cite more recent issues as well, Agnes Pak (a Korean American lawyer) is presently suing GitHub for allegedly altering her performance report to justify her firing after she complained about race & gender concerns related to her lower pay and fewer stock options as compared with colleagues, and transgender software developer Coraline Ada Ehmke declined a severance package last year in order to be able to talk about her negative experiences working there. (Both of those issues seem to relate to quantifying/rewarding/acknowledging labor.)

Bethany Nowviskie:

Is seeing and commodifying the same thing?

Bethany Nowviskie:

I know from your wonderful Values and Metrics statements that the dominance of visual modes of understanding & interpretation in data science is on your minds, and that you are therefore seeking to include more non-visual examples all throughout the book — but since our ability to “see invisible labor” is the structuring metaphor for this chapter, it feels to me like you need an explicit acknowledgment of the ableism inherent in lines like this. The metaphors seem useful throughout and the notion of visible/invisible labor is appropriate and easy to grasp, but references to “wilful blindness” may be alienating to readers. A lot of what you write here also makes me wonder to what degree you even want to imply that sightedness is the default!

Bethany Nowviskie:

It would be nice, if possible, to create at least as thorough an acknowledgments-oriented caption for the Dlineage image below as you were able to do for Ship Map… (And it’s a credit to *you* that you’re making me notice where those may be lacking, not just here but in my own work!)

Bethany Nowviskie:

This is the first time I’ve wondered about gender-based differences in the timestamps of code commits, and whether — as more software development positions transition to work-from-home gigs — there are discernible patterns here that relate to the well-known issue of women continuing to take on more housework and childcare responsibilities than men, even households where all adults are fully employed “outside” the home. Do women’s code commits tend to drop off more around the time schoolkids get off the bus, for instance? If so, do co-workers notice and companies adapt, or is this something women workers are trying to hide. In other words, the same mechanisms that you rightly highlight here as making women’s work more visible may be experienced as unwelcome hypervisibility or tools of surveillance. Maybe somebody has done work in this area and it could be something to pick up on later in the chapter.

Bethany Nowviskie:

The study you cite has prompted experiments like the following in “blind” code review from Mozilla, perhaps worth considering in light of the (sometimes vexed; will say more below!) metaphors of visibility/invisibility and “seeing” labor that shape this really excellent chapter: https://blog.mozilla.org/blog/2018/03/08/gender-bias-code-reviews/ Note even the image that Mozilla chose to illustrate the article. (I also wonder if you could find a place to think through the trade-offs involved in masking gender as a bias-mitigation strategy, in terms of obscuring women’s labor and contributions to code, and the implications of the optional “gender/identity reveal” moment built into tools like this — particularly at a moment in which trans developers are also highlighting bias in the software industry.)

?
Margaret Heller:

I find the concept of “cognitive capitalism” useful in discussions of crowdsourcing. So many social media platforms try to harness the attention of women specifically, as ways to fill in gaps in time/attention, which reduces the ability to do original creative work or intellectual pursuits.

?
Christopher Linzy:

“the largest library in the world” although this might not be true as the British Library may be larger.

?
James Scott-Brown:

Write in italics?

?
James Scott-Brown:

What is the basis for deciding that the work of home health aids, but not nurses, is underwaged?

?
James Scott-Brown:

The heatmap has an annotation “A horizontal line shows that 1996 was an especially furious year of digitizing older records”: it should probbaly say “vertical line” rather than “horizontal line”.

?
James Scott-Brown:

Do you have a reference for this claim?

?
Shannon Mattern:

Excellent chapter. In addition to crediting the wide network of contributors who make data projects possible, perhaps in “showing the work” we also have to think about documenting process — demonstrating that data are *made*, formed through human choices, sifted and cleaned for analysis and visualization, etc. Of course we could think about creative approaches to version control — but how might we also capture other dimensions of the data lifecycle? Could we translate some aspects of the Atlas of Caregiving for data science?

?
Shannon Mattern:

Ah, so glad to see this!

?
Shannon Mattern:

You might also like to acknowledge the time-space geography of feminist geographers like Marianna Pavlovskaya or Mei-Po Kwan, who renders the daily pathways of careworkers.

?
Shannon Mattern:

New media art curators and collectives seemed to be quite prescient in rethinking the wall label and giving credit to the various members of a collaborative team. See, for instance, Jon Ippolito’s 2008 “Death by Wall Label”: http://thoughtmesh.net/publish/11.php

?
Shannon Mattern:

And maybe to allow for *collaborative* contributions, rather than attributing work to lone individuals?

Bethany Nowviskie:

Agreed that it would be good to emphasize this! When we were working on the latest version of the MLA Guidelines for Evaluating Work in DH and Digital Media, we tried hard to clarify that these processes are inherently collaborative, and that individual scholars should not be punished or receive only fractional credit for work done in partnership, including with nonacademic communities and broad publics: https://www.mla.org/About-Us/Governance/Committees/Committee-Listings/Professional-Issues/Committee-on-Information-Technology/Guidelines-for-Evaluating-Work-in-Digital-Humanities-and-Digital-Media I remember being especially inspired by various codes and statements on tenure & promotion put together by organizations representing public historians: https://ncph.org/history-at-work/does-it-count/ In work I was doing at the time, I also tried to emphasize how vital it is to understand contributions to DH projects *as process,* not just in terms of their final products — including in essays/talks like this: http://journalofdigitalhumanities.org/1-4/evaluating-collaborative-digital-scholarship-by-bethany-nowviskie/

?
Shannon Mattern:

Maybe a nod to Sarah T. Roberts’ work — and especially her forthcoming book?

?
Yoehan Oh:

Yes! Her Behind the Screen Content Moderation in the Shadows of Social Media in 2019! And Tarleton Gillespies Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media (2018) can be added here.

?
Sarah Yerima:

When you discuss the varied forms of exploitation upon which tech and data markets rely, you critique the proliferations of a particular type of capitalist ideology: neoliberalism. Since so many feminist scholars are deeply concerned with how and why financial capital becomes prioritized at all costs (by this I mean the very lives of those who exist on the margins), it might be worth naming neoliberalism and introducing the concept to your readers.