Chapter Six: Show Your Work
If you work in software development, chances are that you have a GitHub account. As of June 2018, the online code management platform had over 28 million users worldwide. By allowing users to create web-based repositories of source code (among other forms of content) to which project teams of any size can then contribute, GitHub makes collaborating on a single piece of software, or a website, or even a book, much easier than it’s ever been before.
Well, easier if you're a man. A 2016 study found that female GitHub users were less likely to have their contributions accepted if they identified themselves in their user profiles as women. Critics of GitHub’s commitment to inclusivity (or lack thereof) also point to the company’s internal politics. In 2014, GitHub’s co-founder was forced to resign after allegations of sexual harassment were brought to light. But problematic gender politics do not necessarily preclude other feminist interventions. And here, GitHub makes an important one: the platform helps show the work of writing collaborative code. In addition to basic project management tools, like bug tracking and feature requests, the Github website also generates visualizations of each team member’s contributions to a project’s codebase. Area charts, arranged in small multiples, allow viewers to compare the quantity, frequency, and duration of any particular member’s contributions. A virtual “punch-card” reveals patterns in the time of day when those contributions took place. And a flowchart-like diagram of the relationships between various branches of the project’s code helps to acknowledge any sources for the project that might otherwise go uncredited, as well as any additional projects that might build upon the project’s initial work.
Coding is work, as anyone who’s ever programmed anything knows well. But it’s not always work that is easy to see. The same is true for collecting, analyzing, and visualizing data. We tend to marvel at the scale and complexity of an interactive visualization like the Ship Map, which we first discussed in Bring Back the Bodies as an example of the view from nowhere. That view, as it turns out, presents the path of every ship in the global merchant fleet over the course of the 2012 calendar year. By plotting every single trip, the Ship Map exposes the network of waterways that constitute our global product supply chain. But we are less often exposed to the network of processes and people that help constitute the visualization itself. From the seventy-five corporate researchers at Clarksons Research UK who assembled and validated the underlying dataset, to the academic research team at University College London’s Energy Institute that developed the data model, to the design team at Kiln that transformed the data model into the visualization that we see-- and that is to say nothing of the tens of thousands of commercial ships that served as the source of data in the first place--visualizations like the Ship Map involve the work of many hands.
Unfortunately, though, when releasing a visualization to the public, we tend not to credit the many hands who perform this work. We often cite the source of a dataset, and the names of the people who designed and implemented the visualization. But we rarely dig deeper to discover who collected our data, who processed it for use, and who else might have labored to made our visualizations possible. Admittedly, this information is sometimes hard to find. At other times, it can’t be found at all. But the difficulty we encounter when trying to acknowledge this work reflects a larger problem in our data supply chain, as Miriam Posner explains. Like the contents of the ships visualized on the Ship Map, about which we only know vague details-- the map can tell us if a shipping container was loaded onto the boat, but not what the shipping container contains-- the invisible labor involved in data work is something that, Posner argues, we willfully see with “partial sight.”
To put it more simply, it’s not a coincidence that much of the work that goes into designing a data visualization remains invisible and uncredited. In our capitalist society, we tend to value labor that we can see. When, in the early 1970s, the International Feminist Collective launched the Wages for Housework campaign, it was this phenomenon of invisible labor that they were trying bring to light. By demanding wages for housework, the group was attempting to erase the distinction between the paid labor of traditional jobs, like office or factory work, and the unpaid labor of household tasks, like cooking or cleaning or child-rearing. Housework might be invisible, these women insisted, performed out of sight and away from the marketplace, but it’s certainly not without value. On the contrary, the invisible labor performed inside the home is precisely what enables those who work outside the home to continue to do so.
Unlike washing dishes, however, data work doesn’t require that you get your hands wet. (Unless, of course, you’re a citizen scientist associated with Public Lab and you’re actually collecting data on water). But invisible labor is what sustains the world of data science as well. When was the last time you saw an analysis of census data list the names of any Federal Census Workers, those people outfitted in orange safety vests who knock on your door to remind you to fill out your census form? Or what about the pool of typists who hand-keyed the text of the historical newspapers that you used to train your neural network? Or the metadata librarian who created the fields for the collections database that you visualized? Or the archivists (or, more likely, student employees) who entered all of the actual records into those fields? This work is not always performed for free, nor is it always performed by women. But we can still view it as invisible labor for the way that it remains invisible to the public eye, and uncredited in the end result.
When looking at the various forms of invisible labor that characterize our present moment, information studies scholars tend to focus on the forms of labor that are not only uncredited, but also unpaid. Visit WagesforFacebook.com and you’ll find a version of the Wages for Housework argument, updated for the present. “They call it sharing. We call it stealing,” is one of the lines that scrolls down the screen in large black type. The “it” refers to a form of invisible labor that most of us perform every day, in the form of our Facebook likes, Instagram posts, and Twitter tweets. We might do it because it’s fun, and we might not expect to be paid for it, but the point made by Laurel Ptak, the artist behind Wages for Facebook, which is the same made by theorists of digital labor, most notably by Tiziana Terranova, is that the invisible unpaid labor of our likes and tweets is precisely what enables the Facebooks and Twitters of the world to profit and thrive.
The world of data science is able to profit and thrive because of unpaid invisible labor as well. How did Netflix improve their movie recommendation algorithm? They crowdsourced it. How did the Guardian, the British newspaper, determine which among two million leaked documents might contain incriminating information about government misspending? They crowdsourced it. The error correction performed on the dataset of early modern books that you downloaded for your text analysis project? That was crowdsourced, too.
“But crowdsourcing is fun,” its proponents might say. “People wouldn’t do it otherwise!” (And in the case of Netflix, they’d be quick to point out that the winning team was paid a million dollar prize). But someone like Ashe Dryden, the software developer behind Programming Diversity, would point out that people can only help crowdsource if they have the inclination and the time. Think back to the example of GitHub. If you were a woman, and you knew your contributions to a programming project were less likely to be accepted than if you were a man, would that motivate you to contribute the project? Or, for another example, Wikipedia. While the exact demographics of Wikipedia contributors are unknown, numerous surveys have indicated that those who contribute content to the crowdsourced encyclopedia are between 84% and 91.5% male. Why? It could be that there, too, edits are less likely to be accepted if they come from female editors. It could also go back to the housework argument. A 2011 study showed that women spend more than twice as much time on household tasks than men do, even when controlling for women who hold full-time jobs. Women simply don’t have as much time.
No one would argue with the fact that time is money, but it’s important to remember to ask whose time is being spent, and whose money is being saved. The premise behind Amazon’s Mechanical Turk, or MTurk, as the crowd-sourcing platform is more commonly known, is that data scientists want to save their own time, and their own bottom line. The MTurk website touts its access to a “global marketplace” of “on-demand Workers,” who are advertised as being more “scalable and cost-effective” than the “time consuming [and] expensive” process of hiring actual employees. But the data entry and data processing tasks performed by these workers earn them less than minimum wage, even as a recent study by the Pew Research Center showed that 51% of U.S.-based Turkers, as they are known, hold college degrees; and 88% are below the age of 50, among other metrics that would otherwise rank them among the most desired demographic for salaried employees.
This underwaged work, as feminist labor theorists would call it, is also increasingly outsourced to countries with fewer (or worse) labor laws, and fewer (or worse) opportunities for economic advancement. A 2010 University of California-Irvine study measured a 20% drop in the number of U.S.-based Turkers over the eighteen months that it monitored. This trend has continued, the real-time MTurk Tracker shows, with workers from India alone now comprising roughly 20% of the total Mturk workforce. (The gender split, interestingly, has evened out over time).
But even in the United States-- and even at companies like Amazon and Google-- the work of data entry is profoundly undervalued in proportion to the knowledge it helps to create. Andrew Norman Wilson’s 2011 documentary, Workers Leaving the Googleplex, exposes how the workers tasked with scanning the books for the Google Books database are hired as a separate but unequal class of employee, with ID cards that restrict their access to most of the Google campus, and that prevent them from enjoying the company’s famed employee perks. (Evidently, working overtime to preserve the world’s cultural heritage still does not entitle you to a free lunch, let alone a free class on how to cook Pad Kee Mao.)
Wilson also observes that Google’s book-scanning workers are disproportionately women and people of color-- a fact that would not surprise the long line of women of color scholar-activists, including Angela Davis, Patricia Hill Collins, and Evelyn Nakano Glenn, who have insisted that economic oppression be recognized as a vector that cuts across the matrix of domination as a whole. (See The Power Chapter where we talk more about this idea). Information studies scholar Lilly Irani confirms that “today’s hierarchy of data labor echoes older gendered, classed, and raced technology hierarchies.” Here, Irani is referring to the underwaged contributions of the first generation of female computers like Christine Darden, who we met in this book’s introduction, who had to resort to NASA’s Equal Opportunity Office in order to receive her long overdue raise; or, for another example, the below-minimum-wage pay of the Navajo women who, in the early days of digital computing, were tapped to assemble integrated circuits for the largest electronics supplier in the country, Fairchild Semiconductor--a story that Lisa Nakamura has recently exposed.
Irani’s own research focuses on Mechanical Turk, the people it employs, and the people it exploits. As part of this work, Irani built a web tool, the Turkopticon, which enables Turkers to anonymously report unfair labor conditions, as well as any additional information that might help them decide whether to accept any future task.
But the people who perform this “cultural data work,” as Irani terms it, are not only found at Amazon; they’re increasingly the people on whom the entire information economy depends. Cultural data workers are responsible for everything from transcribing audio clips to fine-tuning search algorithms. Even Google relies upon people to confirm the quality of its search results, as official job postings for “Ads Quality Raters” confirm. Among more specific skills like a college degree and “excellent written communications skills,” the job ad specifies that “a deep understanding of the culture is required.”
Cultural data workers are also responsible for the invisible labor involved in moderating the veritable deluge of content produced online every day, ensuring that your Facebook feed is free of dick pics--and, much more disturbingly, videos of beheadings. When a recent exposé in Wired Magazine documented the emotional costs of this labor, performed by some of the least empowered of these workers--women in the Global South--it was met with an outpouring of shock and outrage. But those who study global capitalism for a living would be quick to point out that this exploitation of racialized labor, as they’d term it, has a long and sordid history, one that has its roots in the original form of human exploitation: slavery.
There is an infamous story that is often told in order to illustrate the close connection between capitalism and slavery: in 1781, the British slave ship, Zong, made a series of navigational errors while crossing the Atlantic, resulting in a shortage of drinking water for the 17 crew members and 133 captives on board. After performing a cost-benefit analysis, the crew decided to throw their enslaved human “cargo” overboard, calculating that they could collect enough insurance money on that loss of life to come out ahead. For scholars such as Ian Baucom and, more recently, Fred Moten and Stefano Harney, there is no clearer example of the all-too-easy exchange between people and profit that capitalism enabled then, and still enables today.
Not by coincidence does our present global technological infrastructure follow this same pattern of exploitation, as Miriam Posner, along with Robert Meija and Safiya Noble, among others, have shown. The cobalt required to produce the lithium-ion batteries that power our cell phones and laptops, for example, may no longer be mined by people in physical shackles, but its extraction is still associated with significant human rights violations, including coercing labor from Congolese children as young as seven. The unregulated disposal of this and other minerals, as well as the electronics that house them, have resulted in entire cities along the west coast of Africa, as well as in China, becoming toxic “e-waste” sites. The humanitarian and ecological stakes couldn’t be higher, nor could their source be any more clear: the capitalist forces that encourage the exploitation of Black bodies so that white bodies can thrive.
These weight of these forces can seem overwhelming. But, as any activist would remind you, any global resistance must begin at home. It follows, then, that we can each start by working harder to acknowledge the range of people who have contributed to our own projects, as well as those whose labor we might inadvertently exploit.
How, more specifically, can we go about this? For one model, we might look to an existing subfield of information visualization, known as data provenance. Data provenance typically refers to the practice of visualizing the history of the changes to a dataset that take place over the course of a project. Because the practice derives, ironically, from supply chain management, most extant data provenance visualizations focus solely on the data itself. They document which fields from which databases were combined with which data from which API, and which technical processes were performed on those data. But just as the ships on the Ship Map are piloted by people--people making real-world decisions like whether to steer around pirates in Somalia; or whether to wait on line for the Suez Canal--we might improve upon existing data provenance diagrams to include human processes as well. Who first collected the data, or processed it for use? What was the workflow employed by the design team, and how could it be plotted in relation to the data provenance chart? Was there a point at which the data analysis phase shifted from exploration to confirmation; or were there other significant conceptual shifts that could be rendered visible? These are only some of the questions that might be answered in such a diagram so that everyone involved in the data analysis process could receive credit for their work, and any dependencies--both intellectual and interpersonal--could be acknowledged.
One level up, at the level of labor itself, we might take an additional cue from the Next System Project, a research group aimed at documenting and visualizing alternative economic systems. In one report, the group compiled information on the diversity of community economies operating in locations as far-ranging as Negros Island, in the Philippines, Quebec province, in Canada, and the state of Kerala, in India. Their report employs the visual metaphor of an iceberg, in which “wage labor” is positioned at the tip of the iceberg, floating above the water, while dozens of other forms of labor– “informal lending,” “consumer cooperatives,” and work “within families,” among others– are positioned below the water, providing essential economic ballast, but remaining out of sight.
With the idea of underwater labor in mind, we might return to the example of GitHub, which begins this chapter, in order to ask what additional forms of labor might contribute to the production of code, but that cannot be represented by the visualization scheme that GitHub currently employs. We might also think of the work of the project manager, which is not directly expressed in a particular number, or size, or frequency, of contributions, but nevertheless ensures the quality and consistency of all project code. We might wonder about the work of the designer on a project, or of the technical writer– both of whom might have helped to shape the project in its initial phases, but who have likely moved on to other tasks. We might additionally consider the contributions of the user experience specialist, or the quality assurance tester, who might enter the development process at a later phase of the project, but whose work is no less essential to the project’s ultimate success. In the case of a consumer-facing project, we might also consider the contributions of the sales or customer support teams. These forms of labor, both productive and reproductive, are of course essential to the success of the project, but are not currently rendered visible, nor could they ever be easily visualized, by a scheme that considers project contributions to consist of code alone.
When designing data products from feminist perspectives, we must aspire to show the work involved in the entire lifecycle of the project, even if it can be difficult to do. Whether it be a team of software developers working on GitHub, a team of visualization designers at Kiln, or a group of sugarcane farmers in the Philippines, a feminist approach would insist on recognizing the range of communities that produce the data. It would include the people who then collect, digitize, and transform it into a dataset; those who are subsequently enlisted to process the dataset; those then work to analyze and/or visualize the dataset; and finally, those who interpret the images or interactions that are produced, or otherwise experience their effects. Each of these roles is essential to the process of producing knowledge, but relies upon a variety of forms of labor– some visible, some not– in order to take place.
Showing all of this work is a tall order, and as designers and data analysts ourselves, we’ll be the first to admit that it’s not always one that can be fully achieved. But showing the work, as this chapter is named, begins with a commitment to acknowledging the range of forms of work that have been performed, even if they can’t be ascribed to a specific person or credited by a single name.
In more instances than you might think, this work can be surfaced from the data themselves. For instance, Benjamin Schmidt, whose research centers on the role of government agencies in shaping public knowledge, decided to visualize the metadata associated with the digital catalog of U.S. Library of Congress, the largest library of the world. Schmidt’s initial goal was to understand the collection and the classification system that structured the catalog. But in the process of visualizing the catalog records, he discovered something else: a record of the labor of the cataloguers themselves. When he plotted the year that each book’s record was created against the year that the book was published, he saw some unusual patterns in the image: shaded vertical lines, step-like structures, and dark vertical bands that didn’t match up with what one might otherwise assume would be a basic two-step process of 1) acquire a book; and 2) enter it in.
The shaded vertical lines, Schmidt soon realized, showed the point at which the cataloguers began to turn back to the books that had been published before the library went digital, filling in the online catalogue with older books. The step-like patterns indicated the periods of time, later in the process, when the cataloguers returned to specific subcollections of the library, entering in the data for the entire set of books in a short period of time. And the horizontal lines? Well, given that they appear only in the years 1800 and 1900, Schmidt inferred that they indicated missing publication information, as best practices for library cataloguing dictate that the first year of the century be entered when the exact publication date is unknown.
With an emphasis on showing the work, these visual artifacts should also prompt us to consider just how much physical work was involved in converting the library’s paper records to digital form. The darker areas of the chart don’t just indicate a larger number of books entered into the catalog, after all. They also indicate the people who typed in all of those records--millions and millions of them. (Schmidt estimates the total number of records at ten million and growing). Similarly, the step-like formations don’t just indicate a higher volume of data entry. They indicate strategic decisions made by library staff to return to specific parts of the collection, and reflect those staff members’ prior knowledge of the gaps that needed to be filled. In other words, their intellectual labor as well.
There is also a political dimension of this work. For instance, in 1996, we see a dark vertical line that Schmidt tell us indicates “an especially furious year of digitizing older records.” Could it possibly be that, in the lead-up to the presidential election that would result in Bill Clinton’s second term, the federal government granted additional funding to the Library of Congress that enabled them to hire additional staff? Or did it indicate the fear that the Republican candidate, Bob Dole, would win the election and reduce the amount of funding for federal agencies, leading to the existing cataloguers to redouble their efforts? We can’t know the answer without additional research, but these questions help to show how the dataset always points back to the data setting--a term coined by Yanni Loukissas, which we introduce in Chapter Four--and to the people who labored in that setting in order to produce the data that we see.
The people who labor in office buildings, and on top of them, are the subject of Builders of the Vision, by Daniel Cardoso Llach. That project employs data collection, as well as visualization, in order to analyze the hidden social hierarchies at work in international construction projects. For Builders of the Vision, Cardoso Llach wrote a script to register the design conflicts between different versions of the plans for the Thomas Wynne Mall, which recently opened along a stretch of otherwise desolate road in Abu Dhabi. In any large building project, design conflicts are common; they emerge when different subcontracting teams--say, the architects and the mechanical engineers--employ different software to draw up their plans for the project. It then becomes the task of a project coordinator to translate those plans into a single file format, identifying and resolving any inconsistencies along the way. Cardoso Llach’s code sat on top of the project management software, recording the date and time of each conflict, the subcontracting teams involved, and the team whose design was accepted. By visualizing this conflict history, he was able to expose the evolving work patterns and shifting power dynamics within the project as it reached completion. In so doing, Cardoso Llach also exposes the hidden complexity of computational labor today.
Of course, there is also labor that remains hidden because we are not trained to think of it as labor at all. This is what’s known as emotional labor, and it’s (yet) another form of work that feminist theory has helped to bring to light. As described by feminist sociologist Arlie Hochschild, emotional labor describes the work involved in managing one’s feelings, or someone else’s, in response to the demands of society or a particular job. Hochschild coined the term in the early 1980s to describe the labor required of service industry workers, such as flight attendants, who are required to manage their own fear while also calming passengers, during adverse flight conditions. In the decades that followed, the notion of emotional labor was supplemented by a related concept, affective labor, so that the work of projecting a feeling (the definition of emotion) could be distinguished from the work of experiencing that feeling (the definition of affect).
We can see both emotional and affective labor at work all across the technology industry today. Consider, for instance, how call center workers and other technical support specialists must exert a combination of affective and emotional labor, as well as technical expertise, in order to absorb the rage of irate customers (affective labor), reflect back their sympathy (emotional labor), and then help them with--for instance--the configuration of their wireless router (technical expertise). In corporate headquarters, we might also consider the affective labor required by women and underrepresented groups of all kinds, in all situations, who must take steps to disprove (or simply ignore) the sexist, racists, or otherist assumptions they face– about their technical ability, or about anything else. And they must do so while also performing the emotional labor that ensures that they do not threaten those who hold those assumptions, who often also hold positions of power over them. Are there ways to visualize these forms of labor, giving visual presence–and therefore acknowledgement and credit--to these outlays of work?
One example to prompt our thinking can be found in the Atlas of Caregiving, an ongoing project aimed at documenting the work involved in caring for a chronically ill family member. The project’s name plays on the concept of the anatomy atlas, a compendium of illustrations of the human body that doctors can consult for information and reference. In this case, the goal was to illustrate the sometimes physical, and sometimes emotional or affective work of care. The research team outfitted its participants with a variety of biometric sensors, including accelerometers and heart-rate monitors, as well as with body cameras programmed to take a picture every fifteen minutes. They then visualized these data alongside excerpts from personal interviews, as well as from the activity logs they asked the caregivers in the study to complete. The result is a complex picture of caregiving, one that marshalls data in the interest of creating a comprehensive view of the range of labor involved in caregiving work.
Of course, a measure like heart rate or skin temperature is only a proxy for human feelings, and this is a common critique of the Quantified Self movement overall. This understanding served as the genesis for “Bruises: The Data We Don’t See.” This artful visualization, created by visualization designer Giogia Lupi, and accompanied by a musical score composed by Kaki King, attempts to make visible the emotional toll of parenting a child with chronic illness, as King herself was required to do when her own child was diagnosed with a rare autoimmune disease. Her daughter’s illness, Idiopathic Thrombocytopenic Purpura, or ITP, is described as a “very visual disease,” and presents as bruises and burst blood vessels all over the body. For this reason, King was instructed to watch her daughter’s skin and record any significant changes. She also thought to record her own feelings in terms of hope, stress, and fear, creating subjective data to complement the hard numbers she received from the blood tests her daughter was required to endure.
When Lupi, who knew King from previous collaborations, set out to design her visualization, her goal was to “evoke empathy,” and make her audience “feel a part of a story of a human’s life.” In contrast to the Atlas of Caregiving, which relies upon standard visualization techniques like radial timelines and Gantt-style charts in order to legitimate the work of care, Lupi sought alternative visualization strategies that would evoke the emotions she sought. She employed a fluid timeline to reflect the subjective nature of what feminist disability studies scholar Alison Kafer calls “crip time.” Days become white aspen-shaped leaves, segmented not by weeks or years but by hospital visits. Red dots indicate platelet counts, with color deployed mimetically in order to convey the intensity of the bruises, as well as the visuality of the “data” recorded by King. Lupi also employed color to represent King’s record of her feelings, with black corresponding to stress and fear; and yellow to signify hope. King’s fear and hope are also visualized by hand-drawn lines that reflect each on a scale of one to ten. The result is both visually and aurally affecting composition of the affective labor of mothering and care.
Lupi and King, or the Atlas of Caregivers, are not the first to want to identify and make visible the work of care. Since the mid-1990s, when Nancy Folbre introduced the term, care work has been a significant topic of interest for feminist scholars. Folbre’s primary model of care work was the everyday work of caring for a child. But we might also think of additional burden of caring for a sick child, as documented in “Bruises,” or for a family member, as in the “Atlas of Caregivers.” Care work isn’t necessarily performed for free. It can also include the underwaged work performed by daycare workers or home health aids, as well as the waged work of doctors, nurses, physical therapists, mental health professionals, and so on. What binds these forms of work together across economic lines is their motivation: as theorized by Folbre, care work is undertaken out of a sense of compassion with, or responsibility for others, rather than with a goal of monetary gain. But when it comes to the market, altruism is a double-edged sword. These same professional care-workers--who are predominantly women and people of color--are often paid less than they would be in other fields. Why? Because they care.
A similar commitment to others--and, increasingly, an awareness of how care can also oppress--is what has prompted groups like The Maintainers to take up theories and practices of care in relation to data work. Through a series of workshops, conferences, and publications, The Maintainers are trying to counter the current tendency to celebrate technological innovation and discovery. The work that should be celebrated, they argue, is the work that sustains and maintains the world we live in today; and not work that passes over the problems of the present in order to look ahead. But there is yet more work to be done. Data work, after all, is part of a larger ecology of knowledge. Like the network of ships visualized on the Ship Map, or the network of source code stored on GitHub, the network of people who contribute to data projects is vast and complex. We may never be able to acknowledge all of the work that goes into these projects, at least not explicitly. But thinking through the many of forms of labor involved in data science, by making them visible, and by giving them credit, might be the most important data work of all.