Chapter Four: Unicorns, Janitors, Ninjas, Wizards, and Rock Stars
In Spring 2017, Bloomberg News ran an article with the provocative title "America's Rich Get Richer and the Poor Get Replaced By Robots". Using census data, the authors report that income inequality is widening across the nation. San Francisco is leading the pack, with an income gap of almost half a million dollars between the richest and the poorest twenty percent of residents. It has the lowest proportion of children for any major US city and a growing rate of evictions since 2003.
While the San Francisco Rent Board collects data on these evictions, it does not track where people go, how many end up homeless or which landlords and developers are systematically evicting major blocks of the city. This is where the Anti-Eviction Mapping Project (AEMP) stepped in, starting in 2013. Led by two women—Erin McElroy and Terra Graziani—the project is a self-described collective of "housing justice activists, researchers, data nerds, artists, and oral historians." They are mapping eviction, but they are doing so through a collaborative, multimodal, and yes, quite messy, process.
If you visit antievictionmap.com, there isn't one single eviction map. There are a total of 78 distinct maps linked from the homepage. Maps of displaced residents, of evictions, of tech buses, of property owners, of the Filipino diaspora, of the declining numbers of Black residents in the City, and more. AEMP has a distinctly fluid, collaborative, and community-based way of working. Some projects originate from within the group. For example, the group is working on producing an atlas of the Bay Area called Counterpoints: Bay Area Data and Stories for Resisting Displacement which has chapters on such topics as Migration/Relocation, Gentrification and the Prison Pipeline, Indigenous and Colonial Histories, and Speculation. The majority of the projects happen in collaboration with nonprofits, students, and community-based organizations. For example, the Eviction Defense Collaborative (EDC) is a nonprofit that represents people who have been evicted in housing court. While the City does not collect data on the race or income of evictees, EDC does collect those demographics, and they work with 90% of evicted tenants in the city. In 2014, EDC approached AEMP to help produce their annual report1 and in return offered to share their demographic data with the organization. Since then, the two groups have been working together on data sharing, annual reports and spatial analysis of evictions based on race. And the AEMP has gone on to produce reports with tenants rights organizations, timelines of gentrification with indigenous students, oral histories with grants from Anthropology Departments, and murals with arts organizations. They all have the singular goal of documenting displacement and creative resistance, from the standpoint of the residents and community members. Once you dive into the seventy-eight maps, the charts and stories and voices multiply further. It's not a simple story.
The Anti-Eviction Mapping Project's process and products would seem to be antithetical to the received wisdom in data science and visualization circles. Business writers tout the ability of data visualization to reduce complexity and create new insight, quickly and clearly. "Nothing going on in the field of business intelligence today can bring us closer to fulfilling its promise of intelligence in the workplace than data visualization," wrote Stephen Few in an early 2007 white paper on the promise of data visualization. The story – told by prominent figures in the field such as Few and Nadieh Bremer and David McCandless and Ben Schneiderman – goes something like this: We are living in the age of Big Data in which humans cannot process and make sense of the vast stores of information they are collecting. While our capacity for processing text and numbers is limited, our eyes are uniquely suited to detect patterns from a sea of visual information. As leading researcher Colin Ware has written, "Visualization provides an ability to comprehend huge amounts of data. The important information from more than a million measurements is immediately available." Thus, visualizing large datastores in sensible, user-friendly ways is our ticket to making sense, making decisions, and making money.
This is not an untrue story. It really is measurably easier and faster to see patterns in a table of numbers if they are presented in graphic form. And some of AEMP's seventy-eight maps perform this widely acknowledged function of data visualization by making patterns in the data visually apparent, at a glance.
For example, the Tech Bus Stop Eviction Map produced by the collective in 2014 plots the location of three years of Ellis Act evictions. This is a form of "no-fault" eviction in which landlords claim that they are going out of the rental business, in many cases to convert the building to a condominium and sell units at significant profit. San Francisco has seen almost five thousand uses of the Ellis Act to evict residents since 1994. In this case, AEMP plotted these evictions in relationship to the location of technology company bus stops. Starting in the 2000s, tech companies with campuses in Silicon Valley began offering private luxury buses as a perk to attract employees who wanted to live in downtown San Francisco but didn't want the hassle of commuting. Colloquially known as "the Google buses", these vehicles used public bus stops – illegally at first – to shuttle their riders in comfort. The location of the bus stops also meant that there was a new, wealthy clientele for condos in the area and so property values around the bus stops soared. Here the AEMP makes the case that so, too, did evictions of long-standing residents. Their analysis, shown in the image above, demonstrates that 69% of no-fault evictions between 2011-13 occurred within four blocks of a tech bus stop.
But other AEMP maps, like Narratives of Displacement and Resistance, pictured above, do not have an efficient analytical function. All 5000 evictions are represented as sized red bubbles, so the basemap of San Francisco is barely visible. On top of the sea of red bubbles, sky blue bubbles dot the locations where the AEMP conducted audio and video interviews with displaced residents, activists, mediamakers, and local historians. Clicking on a sky blue bubble sets the story in motion: "I was born and raised in San Francisco proudly," begins Phyllis Bowie, a resident facing eviction in her Midtown apartment. She goes on to tell the story of returning from serving in the Air Force and working like crazy for two years to build up her income record at her small business to be eligible for a one bedroom lease-to-own apartment in Midtown, a historically Black neighborhood where she had grown up. In 2015, the city broke the master lease and took away the rent control on their building. Now, the tenants, who moved there on the promise of a future of property ownership, are facing skyrocketing rents that none of them can afford. Bowie is leading rent strikes and organizing tenants but their future is unclear.
The point of this map is not for the eyes to efficiently detect a correlation between space and evictions. There are very few patterns to detect when the entire city is covered in big red eviction bubbles and abundant blue story dots. Rather, the visual point is simple and exhortative: "There are too many evictions". Behind each eviction is a person, with a unique voice and a story like Bowie's.
The Narratives map would appear to be messy. It does not efficiently reveal how evictions data may be correlated with BART stops, income, Google bus stops or any other potential dimensions of the data. Moreover, even finding the Tech Bus Stop Eviction Map or Narratives map is complex given the sheer number of maps and visualizations on the AEMP website. There is no "master map" that integrates all of the information that AEMP has collected into a single interface. So AEMP's efforts would seem to fail at what proponents indicate is the basic value of data visualization: taming information overload, integrating large amounts of information and detecting visual patterns efficiently.
But perhaps cleanliness, efficiency and control are not the only criteria by which to judge data visualizations.
A related story that falls into the Data Science Bin of Received Ideas That We Might Want To Think About More is that data always needs to be tamed—it is messy and in need of cleaning. "It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data," writes Hadley Wickham, in the first sentence of the abstract for his widely read and cited 2014 paper called "Tidy Data." Wickham is the author of the tidyr package for the R statistical computing platform which logs around 177,000 downloads per month. Articles in the popular press and business corroborate this need for tidiness. The Harvard Business Review calls the Data Scientist "the sexiest job of the 21st century" and talks about this new special form of human: "At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible." Here, the intrepid analyst wrangles an orderly table from unstructured chaos. For the New York Times in 2014 ("For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights"), the analyst's work is less sexy and equated to the low-wage, maintenance labor of janitors.
Whether or not you think data scientists are sexy (they are) or whether you think janitors should be offended by this classist reference (we all should be), there are some interesting assumptions and anxieties surfacing in both of these sets of received ideas. In one story, humans are able to tame the chaos of information overload visually—visual organization helps us go from data to intelligence. In the second story, data are dirty and it is actions of cleaning and tidying that put them back in their proper places.
But what might be lost in the process? Or, more specifically, whose perspectives are lost in the process of dominating and disciplining data and whose perspectives are imposed on the results? Both sets of received ideas make normative assumptions—namely, that we all value cleanliness, efficiency and control over messiness, inefficiency and complexity.
Scholars Katie Rawson and Trevor Muñoz have advanced the idea that "the cleaning paradigm assumes an underlying, 'correct' order," and warn that cleaning can be a "diversity-hiding trick." In the perceived "messiness" of data there is actually rich information about the circumstances under which it was collected. Yanni Loukissas concurs. Rather than talking about data sets, he advocates that we talk about data settings—both the technical and the human processes that affect what information is captured and how it is structured. Loukissas likes to quote the anthropologist Mary Douglas, who said, famously, “What is dirt but matter out of place?" He employs the example of data in actual dirt: the soil of the Arnold Arboretum, in Boston, MA. In order to determine the origin of a single cherry tree, he explains, you need to know to look at multiple database fields, including one that indicates who incorporated the plant into the collection, another that records the native country of the species, and a third that documents the way it came to the collection. The "messiness" of storing related data in three different fields is actually a signal—i.e. meaningful information—that points back to the complex history of recordkeeping practices at the institution.
Loukissas' assertion is that all data are "local," by which he means they are connected, sometimes inextricably, to the human and technical conditions under which they are collected and maintained. For example, he tells the story of exploring data from the Clemson University library in South Carolina while he was at a hackathon in Cambridge, MA. He stumbled across a puzzling record where the librarian had noted the place it described as "upstate". Such a classification is relational to the place of collection—"Upstate" is a term immediately understandable to South Carolinians and refers to the westernmost region of the state, where Clemson is located. But it has no relevance to a person sitting at a hackathon in New England, who might prefer a more generalized way of denoting the ten or so counties that count as Upstate. But note that even though the outsider may be frustrated with the fact that the record doesn't use latitude and longitude, there is meaningful and precise geographic information contained in the "upstate" reference. Not only that, but there is meaningful metadata provided by this cultural insider reference: Only somebody collecting the data in South Carolina would have referred to that region as "upstate," so we can reason that the data was collected there. It is because of records like this that taming and cleaning data is such a chore—it is like chopping off the roots of a tree that connects it to the ground from which it grew. It is painstaking and cumbersome. Plus, uprooting a tree might not always make sense.
We might relate the growth of tools to tidy, tame and discipline data to the proliferation of street names and signs in the landscape. Geographer Reuben Rose-Redwood describes how, for example, prior to the Revolutionary War, very few streets in Manhattan had signs posted at intersections.2 Street names, such as they existed, were vernacular and related to the particularity of a spot, e.g. "Take a right at the red house." With the increased mobility of people and things—think of the postal system, the railroads, the telegraph—street names became systematized in the 19th century in the United States and institutionalized by the early 20th century. Rose-Redwood calls this the production of "legible urban spaces." There is high economic value to legible spaces, particularly for large, international corporations to deliver boxes of anything and everything directly to our front door.
The point here is that one does not need street names for navigation until one has strangers3 in the landscape. Likewise, data does not need cleaning until there are strangers in the dataset.
Who are those strangers in the dataset? People who work with data are alternately called unicorns (because they are rare and have special skills), wizards (because they can do magic), ninjas (because they execute complicated, expert moves), rock stars (because they outperform others) and janitors (because they clean messy data).
These operators are "strangers" in data sets because they often sit at one, two or many levels removed from the collection and maintenance process of the data that they work with. This is a negative externality of open data, APIs and the vast stores of training data sets available online: the data appear available and ready to mobilize, but what they represent is not always well-documented or easily understood by outsiders. This problem—that data do a very poor job of speaking for themselves, especially when the listener is a stranger—is something we will return to at length in the next chapter.
Unicorns, wizards, ninjas, rock stars and janitors all have something in common. Apart from the unicorn, a mythical creature that is not usually depicted with a readily apparent gender, they are mostly stereotyped as male undertakings.4 And unicorns, wizards, ninjas, rock stars and janitors work alone. Their work is solitary and singular. When applied to data science, the focus is on an individual's extraordinary technical expertise and ability to determine meaning where others cannot. Solo superheroes and informational geniuses that weave meaning from chaos.
There is a "genius" in the world of eviction data – it is Matthew Desmond, officially designated as such by the MacArthur Foundation for his work on poverty and eviction in the US. He is a professor and director of the Eviction Lab at Princeton University. In an article for Shelterforce, organizers from the AEMP, the Workers’ Center of Central New York, the Community Alliance of Tenants in Oregon, and the Housing Justice League in Atlanta detail how Desmond and the Eviction Lab were pursuing a big data agenda for acquiring national evictions data at the expense of understanding local context and providing adequate protections for the communities represented by the data. Initially, the Eviction Lab approached community organizations like AEMP to request their data. The AEMP wanted to know more – about privacy protections and how the Eviction Lab would keep the data from falling into landlord hands. Instead of continuing the conversation, Eviction Lab turned to a real estate data broker and purchased data of lower quality. The authors write, "AEMP and Tenants Together have found three-times the amount of evictions in California as Desmond’s Eviction Lab show." Unfortunately, due to the fact that Desmond is a "genius," his social capital (combined with Princeton's) mean that the numbers that many policymakers see and use to make decisions are inaccurate. In this case, the priority was on speed – at the expense of establishing trusted relationships with actors on the ground – and national coverage – at the expense of accuracy. Note that speed and perceived comprehensiveness help to maintain and secure the status of the white male genius and his institution, while strategically downplaying the work of coalitions, communities and movements that are led primarily by women and people of color. This is a classic case of Big Dick Data, a phenomenon you can read more about in the next chapter of this book. "We're unpacking America's eviction crisis," proclaims the Eviction Lab home page, "Find out how many evictions happen in your community."
But what might be gained if we understood data work not as a genius-like wizardly undertaking, but rather work that embraces multiple voices and valued different types of expertise at all stages of the process?
While the Anti-Eviction Mapping Project could have handed off the data it collects to a single mapmaking rock star-unicorn-ninja-wizard-janitor, they made an intentional decision to include many designers in the process, including many non-experts who experienced the power of making maps for the first time. The resultant proliferation of maps, oral histories, events, murals and reports reflects the diverse voices of many collaborators who are working together to document the scope and the scale of San Francisco's housing crisis. And this has the (wholly intentional) consequences of building the technical capacity of residents as well as relationships between community members – slowly and surely, map by map, collaboration by collaboration. In fact one of the explicit goals of AEMP is to "build solidarity and collectivity among the project’s participants who could help one another in fighting evictions and collectively combat the alienation that eviction produces.”5 In addition to translating evictions into insights, the AEMP wants to use the process of making maps to produce new human relationships.
A key contribution of feminist thinking has been to recognize how a multiplicity of voices, rather than one single loud or magical or technical voice, often results in a more complete picture of the issue at hand. Feminist philosophers like Donna Haraway are part of a postmodern wave of thinkers that maintain that all knowledge is partial, meaning no one person or group has the privilege of a distant, objective view of The Truth. Even if they self-identify as a unicorn, janitor, ninja, wizard or rockstar. But embracing pluralism – as this concept is sometimes described – does not mean that everything is relative, nor that all truth claims have equal weight, nor that feminists don't believe in science. It simply means that when people make knowledge, they do so from a particular standpoint – a situated, embodied location in the world.
This is called standpoint theory in feminist thinking. And the easiest way to start to understand standpoint theory is to think of it like perspectives that you have from identities you are born into as well as your experiences. For example, we—the co-authors of this book speaking to you at this moment—are two white, cisgendered women, not Latino transmen. We live in Boston and Atlanta, not Bangalore or São Paulo. We’ve been trained as designers, software developers, and scholars, and not as bank tellers or biomedical engineers. These perspectives matter. They will shape the questions that we ask of the world, the data we collect, the results that we see, and the meaning that we make. The idea behind standpoint theory is that pooling our standpoints makes for a richer and more robust objectivity.
But there are forces beyond the individual operating in standpoint theory. Standpoints, writes sociologist Patricia Collins Hill, are group-based experiences, "Groups have a degree of permanence over time such that group realities transcend individual experiences." She gives the example of being African American, a stigmatized racial group in the US. "While my individual experiences with institutionalized racism will be unique, the types of opportunities and constraints that I encounter on a daily basis will resemble those confronting African Americans as a group." Hill calls on us to use standpoint theory to acknowledge (and address) social inequality based on existing unequal power relations between social groups. Note how this is different than a call for simple diversity in individual perspectives—what people in the tech industry characterize as "thought diversity.”6 This means explicitly acknowledging and taking steps to address the unjust structural forces at play in our work, including racism, sexism and more.
Indeed, beyond simply "embracing different perspectives", feminist standpoint theory asserts that the best way to strengthen objectivity and address injustice in the system is to begin with the lives, experiences and interpretations of the people most marginalized in a particular context. Applying this to computational systems design, Shaowen Bardzell calls for starting first and foremost with the perspective of the "marginal user." From a gender perspective, that would mean beginning with female and non-binary perspectives. On a project that involves international development data, that might mean beginning not with institutional goals but with indigenous standpoints. For the AEMP, that means centering the voices and experiences of those who have been evicted. Privileging marginal perspectives helps to expose aspects of the world that appear to be neutral and objective, but are actually distorted and one-sided accounts of the world. And centering marginalized standpoints can generate new and critical questions that would otherwise go un-asked because the system is set up to suppress those voices. As Kim Tallbear says, "If we promiscuously account for standpoints, objectivity will be strengthened." So, how do we begin to embrace this kind of plurality of voices and perspectives in data science?
The first step in activating the value of multiple perspectives is to acknowledge the partiality of your own. But how? It can be particularly hard to see just how partial your own perspective is if you are a member of a dominant group whose way of operating in the world stands in for the "default" or "normal" way. Think back to the example of Marya McQuirter’s catalog search, described in Bring Back the Bodies, in which white people were not labeled as such, because being white in the United States is simply so normal that it goes without saying or labeling. Feminist sociologist Michael Kimmel illustrates this concept in the following way: while sitting in a small study group in graduate school, his African American colleague said, "When I look in the mirror I see a Black woman. When a white woman looks in the mirror she sees a woman," to which Kimmel rejoins, half-joking and half-serious, "And when I look in the mirror, I see a human being."
Kimmel says that he sees "a human being" rather than a white man because throughout his life his race and gender have granted him privileges. One of the most notable of these privileges is, paradoxically, to not need to see his race and gender as markers of difference. So, as Kimmel articulates it: "privilege is invisible to those that have it." You know you are a member of the dominant group when your gender or race or religion or sexuality is invisible – the fact that you do not have to consider and negotiate it every day. As whiteness scholar Robin DiAngelo says, "a significant aspect of white identity is to see oneself as an individual outside of or innocent of race, 'just human'."
This basic insight is one of the main reasons standpoint theory is so important: it helps us understand the power dynamics at play in knowledge production and unmask partial perspectives (created by dominant social groups) masquerading as universal truths. If the world's data science is created by mostly white men, then can we consider it objective? No, it represents the standpoint of the dominant group. If machine vision programs are trained on majority pale faces then can we consider them accurate? No, they represent the standpoint of the dominant group. If 3% of the people that work on criminal justice algorithms in the US are African American7 then can we consider that software fair and unbiased? No, criminal justice algorithms represent the standpoint of the dominant group. These are localized standpoints that, because of the accumulated power and privilege of the dominant group, "pass" as objective truths.
These are large-scale, structural inequities, which we address further in The Power Chapter. But we can take individual and institutional steps towards addressing them right here and right now by incorporating more and different standpoints into data-oriented work. How?
For one, you can disclose your own project's methods – rather than sweeping them under the proverbial rug. This is called self-disclosure. You may have even heard the phrase coined by David Weinberger, "transparency is the new objectivity." So rather than attempting to create visualizations and data science products that purport to be objective, you might build a space for transparency and self-disclosure into your design. People in journalism and science have been doing this for some time, at least as it relates to the technical methods employed in their analysis process. For example, Bloomberg's interactive visualization What's Really Warming the World? walks the reader through common climate denier arguments that try to explain away the warming planet with reasons that don't have to do with human activity. Is it volcanos? Is it deforestation? Is it ozone pollution? The piece systematically demonstrates that these factors have little to do with global warming whereas greenhouse gas emissions from human industries are the clear factor in rising global temperatures.
One of the most interesting things about the piece is that it devotes almost a third of its real estate to describing the data it draws from and the methods the authors used for analysis. Providing access to the data as well as describing the methods used to analyze it are conventions in data journalism, aligned with the growing trend towards openness and reproducibility in scientific research. While these methodological accounts are presently focused on technical details – where is the data from, what program was used to analyze it, what statistical models were developed – there is a seed of possibility for revealing other details about the human process of making decisions about data storytelling. Who was on the team? Which hypotheses were pursued but ultimately proved false? What were points of tension and disagreement? When did data need some ground-truthing by talking to data owners or domain experts? There is a story for how every evidence-based case came into being and it is a story that involves money, institutions, humans and tools. Revealing this story through reflection and self-disclosure can be a feminist act.
Data doesn't lend itself naturally to self-disclosure. Dietmar Offenhuber, head of the Information Visualization program at Northeastern University, has advanced the idea that data appears so neutral because it is unclear who is the author. In written text (even the driest and most academic tome), there is always an author, a tone, and a connection back to a human speaker through language and attribution. But, as we described in On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints, data and its visualizations carry tremendous rhetorical power, particularly for newcomers. Databases and charts are often so sophisticated at obscuring the perspectives of their human speakers. How can we connect spreadsheets back to speakers, and visuals back to voices?
Self-disclosure could be as simple as being explicit, transparent and possibly even visualizing who is doing the counting and mapping behind the scenes. Take the example of the aerial mapping image in the photo below. The Public Laboratory for Open Technology and Science (PLOTS) is a citizen science group that got its start during the BP oil spill in 2010. They make high-resolution aerial maps by flying balloons and kites, which dangle cheap digital cameras, over the environmental sites they seek to study. While the technique is low-cost, the imagery produced is often higher resolution than existing satellite imagery because of the proximity to the ground.
As you can see in the image, the mappers themselves are often visible in the final product, in the form of little bodies, gathered in boats or standing in clumps on a shoreline, looking up at the camera above them. The balloon string leads the eye back to their forms. Here, the bodies are not missing but represented in the final product. Literally.
Self-disclosure illustrates the feminist method of reflexivity—rigorous interrogation and transparency about one's own position in the world. Not just one's technical methods, but one's social position, one's institutional position, one's racial position. Reflexivity is a meticulous tactic for addressing that aforementioned conundrum that "privilege is blind to those who have it."
Embracing the value of multiple perspectives shouldn’t stop with transparency and self-disclosure. It also means actively and deliberately inviting other standpoints—specifically, those most marginalized by current power relations—into the analysis and storytelling process. As we have seen, the Anti-Eviction Mapping Project does this through producing maps with—not for—low-income residents and community organizations like the Eviction Defense Collective and Tenants Together. And there are projects that embrace pluralism and privilege marginal standpoints that we can look to as models. They come from diverse sectors such as university research labs, private consulting groups and government.
Since 2012, Rahul and Emily Bhargava have worked with community organizations from Belo Horizonte to Boston to create "data murals". These are large-scale infographics designed to be displayed in public spaces, that tell data-driven stories about the people who occupy those spaces. For example, Groundwork Somerville, an urban agriculture nonprofit, approached the Bhargavas in 2013. Emily recalls that the organization was in the process of establishing its first urban farm, "The site was disorderly – it was behind a used car parts building and hidden between other semi-industrial lots. They had built raised beds and planted for one growing season but passersby were stealing the vegetables." The organization was also running a high school employment program called "The Green Team" but struggling to fully involve the youth in their mission to create healthier communities.
The Bhargavas and key Groundwork Somerville staff collected demographic data from the city, GIS data on unused lots, and internal data such as growing records, food donations, and attendance logs at community events. They worked with the youth over several after- school sessions to review and discuss the data, as well as engage in storyfinding (a.k.a. data analysis). By the end of these sessions, the youth had sketched the overall outline and iconography of the resulting mural. Read left to right, the mural frames the problem: A man grasps for a basket of veggies but it says “healthy food is hard to get”. They back that claim up with numbers depicting the cost of healthy food and the number of people with prediabetes. It then transitions into showing opportunity: the number of unused lots in the city and how much land has been reclaimed for urban farming. Finally, the mural shows how the Groundwork Somerville truck brings many pounds of affordable produce to low income neighborhoods and employs over 400 youth residents. It ends with a vision of a unified and healthy community.
Youth, staff and volunteers worked together over the course of several weeks to paint the twenty-meter-long mural on the corrugated steel wall of the garden. On July 30th, 2013, the Mayor of Somerville and other community leaders attended the ribbon-cutting to officially launch the renovated garden. Emily describes the visit: "The youth, having just spent weeks looking at the data, painting the mural together, and building relationships with staff and volunteers, were able to talk about the story in great detail to their elected officials."
Data murals like the one in Somerville are becoming a more common practice—Communities from Detroit8 to Dar Es Salaam have undertaken data murals to tell a public story about an important issue. In Dar Es Salaam, the Data Zetu project ("our data" in Swahili) ran a listening campaign in four low-income districts. They compiled the residents' concerns, as well as statistical data, into a data mural about teenage pregnancy and sexual health. In the image, a young woman is pregnant and wants to grow up and be a doctor. Her peers tell her that she can still dream big, but that she should seek counseling at the youth health clinic to do the best by her new family.
And murals are just one kind of output from a pluralistic, community-centered data process. There are many examples of participatory mapping that combine data collection and community storytelling. For example, in the project Map Kibera, the GroundTruth Initiative worked with residents to map the largest and most well-known slum in Nairobi. While Kibera was not unmapped prior to 2009 when the project began,9 the maps that were made were used by the government, NGOs and researchers to drive policy, but not made available to residents.
Do you remember the example of the teenager who Target identified as pregnant before her parents did? The issue in that case, as is true of many instances of corporate and government-sponsored data collection efforts, is that the people who collect, store, and derive insight from the data often wield outsized power over those about whom the data are collected. Embracing pluralism is a way to rectify this power imbalance. Map Kibera seeks to redress that asymmetry by teaching residents to collect their own data, make their own maps and tell their own stories through community radio and video journalism. Likewise, the Digital Democracy project works with indigenous groups around the world to defend their rights through collecting data and making maps. In the process, they have developed SMS services with domestic violence groups in Haiti and helped the Wapichana people in Guyana make a data-driven case for land rights to the government.
Neither Abella Bateyunga from the Data Zetu project, nor Erica Hagen and Mikel Maron from the Map Kibera project, nor Emily Jacobi from Digital Democracy, nor the Bhargavas envision themselves as unicorns, or janitors, or ninjas, or wizards, or rock stars. "At Digital Democracy, we try to fight the superhero narrative", says Jacobi. "We are sidekicks rather than superheroes." Through a series of workshops and trainings, these groups enhance the capacity of an entire community to engage in data analysis and storytelling. Emily Bhargava reflects, "Painting a mural is great for building community relationships. But the time when people actually become empowered is during the storyfinding process when they learn to translate the data and own its meaning. And doing that collectively helps to even the power differential." The facilitators in these cases act more like cheerleaders, or guides, or advocates, or even therapists. Perhaps appropriately, the Bhargavas' website is located at DataTherapy.org.
Facilitating data-informed community conversations, mapping forests in Guyana and painting data murals may seem foreign for those who are accustomed to being indoors with their data, but the ideas about participatory meaning-making are transferable to more conventional contexts like municipal government. In 2015, the City of Boston was in the process of developing its first master plan in fifty years. A master plan is a document that guides future growth and development across a range of city systems like transportation, the built environment and social services and settings. The transportation wing of this effort was called Go Boston 2030, and with the help of a grant from the Barr Foundation, they assembled a team that attempted to do something highly inventive in community engagement and participatory data analysis.
The way that transportation master plans typically work is that the city planners set up a framework as well as metrics for success. They then hire external consultants to help undertake a visioning process, involve various stakeholders and members of the public, collect data, analyze, and synthesize that data, and produce a report. The City of Boston did all of these steps, which resulted in the Go Boston 2030 Vision and Action Plan in 2017.
But they also did something different along the way. In addition to holding community input meetings, the City dispatched colorful food trucks and decked-out “Idea Bikes” with trailers to all of the neighborhoods in the city. These mobile units served hot chocolate, and staff offered post-it notes and friendly conversation about commuting in Boston now and in the future. Their goal was to collect data in the form of residents' questions and ideas about the future of transportation in Boston. And collect they did. Over a period of eight months, the City collaborated with the nonprofit Interaction Institute for Social Change to collect 8,700 questions and ideas, public engagement data at an unprecedented scale in relation to prior efforts.
So here is the critical juncture point in the story – what did they do with the data? At this stage in the process, the typical thing to do would be to fork over the citizen ideas to a data wizard-ninja employed by the transportation consultants and await "the answers." Instead, Go Boston 2030 decided to use the data analysis process as an opportunity for participatory meaning-making and consensus building amongst a multitude of stakeholders. They organized a large meeting of policy makers, public servants, and community leaders (including Catherine), in which the participants--organized into thematic groups--reviewed each of the ideas that had been collected, making note of the ideas that warranted further discussion. One idea in Catherine’s group provoked controversy: "More housing density around transit." A state transportation official kicked off the conversation by making an impassioned case for changing the city’s zoning codes. Then, a city planner offered a short history lesson on the dangers of prioritizing high-density (but also high-cost) housing. The nonprofit representatives followed up by advocating for a requirement for affordable housing should any zoning codes be changed. While not everyone agreed on the details, the group did agree to add that idea to a list of priority recommendations for the City to pursue.
The event was instructive in that the process of data analysis was not imagined as a purely technical problem – something to be handled by a natural language processing expert or a statistician. The designers of this process – the consulting firm Interaction Institute for Social Change – understood that the choice to prioritize one idea over another would carry real weight and material consequences for the people of Boston, consequences that a natural language processing expert or a statistician could not understand simply by looking at word frequencies in the data. Determining which ideas to prioritize among the thousands that were collected could only be achieved by bringing many forms of expertise to an actual table.
The GoBoston 2030 process, the Bhargavas' Data Murals, and the Anti-Eviction Mapping Project are examples of the feminist values in action. Note how they represent different institutional starting points. The first was a government endeavor, the second is a small private consulting group, and the third is a fluid grassroots collective. All embrace pluralism, value knowledge from distinct standpoints, disclose their own standpoint, and center the perspectives of marginalized groups. While we do not necessarily need a seventy-five person community meeting to compute average daily temperatures from instrument readings, we may want to consider that meeting the second we try to make meaning from those readings – defining questions to ask the data, interpreting the data or thinking about how to allocate resources across a vast geography based on the data. Which is to say that as soon as data start to become information that can be operationalized for decision-making, they leave the technical domain and cannot be black-boxed into the server closets of even the most crackerjack unicorn-rock-stars. A data scientist is not going to save democracy, but a well-designed, data-driven, participatory process that centers the standpoints of those most marginalized, empowers participants and builds new relationships across lines of social difference? Well, that might just have a chance.