About the Email Archive

One of the most important primary sources of information about Flint’s water crisis has been emails between government officials. Our first and largest collection for the public archive is a searchable database [hyperlink to email search tool] of thousands of government emails deriving from a larger public dataset. The public gained access to these emails in early 2016 when the state of Michigan made a series of releases of emails: eventually 445,298 pages worth of them. Government officials, reporters, and nonprofits requested most of them in previous months through the Freedom of Information Act. 

This data source has been essential to understanding how and why the crisis happened but has been very difficult to use until now. This email database significantly expands our collective ability to understand the crisis. Our database [hyperlink to email search tool] lets users search by sender/receiver, by date, subject, or by word/phrase to see trends in results.  Users can browse the data by day, conduct in-depth research, and understand email patterns and frequencies.  

Users browsing this archive will quickly see this story is complex–and important. The collection constitutes an email communication network between many different senders and receivers in the subset of pages in our database. These emails and related documents played some part in water permitting, quality, regulation, and/or crisis management during the Flint Water Crisis. The dataset has already revealed key decisions and events in how the crisis happened. This dataset has been a crucial resource for journalists, providing most of the major revelations around the water disaster. Journalists working with these materials in 2016 were able to show how state-appointed officials approved using the Flint River for Flint’s drinking water without proper treatment, sending corrosive water down a century-old lead and copper pipe system, and then covering up the resulting harms. Journalists also used this data to correct or challenge some false official narratives, as the crisis broke, about who and what was to blame for the water problems. 

Until now, these emails have been very difficult to search or organize. The full relevance and utility of this data have been undermined by challenges from the dataset’s presentation and size. Because of the way the PDFs were released, the documents were not easy to comb through. Most of the files were not organized chronologically. Many of the documents were “flat” and unsearchable, meaning that readers could not search for words or phrases within the document. These characteristics made it difficult to fully utilize the dataset. The lack of organization in the dataset has limited researchers’ ability to understand the corpus as a whole and its broader insights. 

The political nature of the dataset’s contents has also endangered its preservation. In the spring of  2021, Michigan’s state website quietly wiped the original page containing these PDFs and moved the documents to the online state archive without a message to redirect website users. Many news pieces had linked to the original URL, meaning these links to the dataset are now broken. This public record of emails may be the primary or only large collection of state-level communications relevant to Flint’s water crisis. 

 Although Michigan law exempts the governor himself from FOIA, Snyder opened up these records, primarily of other state workers’ email and attachments, as a bid at transparency. In early 2016, Snyder stated with the document release, stating in part, “By making the information easily accessible, everyone can review it and take what they need, and then we can all focus and work together on solutions, healing and moving Flint forward.” The initial 274-page release of Snyder’s email, however (available here), somewhat undermined this statement of transparency, with the first three pages entirely blacked out from redactions. (See original article, “Snyder e-mails: Aides figured Flint was others’ problem”, John Wisely, Paul Egan, and Jennifer Dixon, Detroit Free Press, Jan 20, 2016.)

  

Subsequent releases of data were much more extensive and much less redacted. Altogether, this archive consists of 97 PDFs, many thousands of pages long. 

This archive is not a complete record of all relevant email communications between state actors for this period. To be clear, we are not likely to ever have a complete record. This archive may be the closest we will get. The state of Michigan’s protocol for preserving official records  allows state agencies and local units of government to dispose of official documents including correspondence, budgets, and grant information following their record retention schedule. Moreover, the state’s records retention law does not recommend or require a specific format for retention of digital records, and each department does its own preservation. 

 Many relevant actors have likely deleted or lost relevant information. The fallout of the criminal cases against state officials also complicates the situation. In 2019, the state attorney general’s office announced the discovery of many relevant documents in this case, but many of these documents are still tied up in legal limbo or have become the subject of legal contestation themselves.

As with all archives, this corpus is selective in its inclusions and exclusions, and should be read in its context as an official record created by the same people involved in (and often implicated in) this story.  Readers will also quickly see how often members of this communication network refer to other modes of communication besides email–“I’ll call you,” or agreements to “talk at lunch” or arrangements for in-person meetings. That means we cannot simply read the record and immediately know “what happened.” The truth does not lie within a single email, and often needs to be triangulated across email threads and with outside information. But historians have long demonstrated that official archives are still invaluable for piecing a story together, even through omissions and evasions, by documenting misrepresentations or contradictions. Moreover, as described below, the email database allows us to observe patterns in email frequency and content that would have been impossible without this processing, letting us evaluate the dataset more critically.  

What’s Included Here

We made the decision to focus on 35 of the 97 PDF files, with email content totaling 87,877 pages containing 33,975 emails and many thousands of attachments, based on the originating department and time frame of the emails in each PDF. (To learn more about the logistics of database construction, please go here [hyperlink to database construction].) Our email database primarily includes material from the following state agencies: 

  • Governor Snyder’s office (Executive Office) includes some emails to or from Governor Snyder, as well as his assistant, chief of staff, legal and political advisors, and communications team.
  • Michigan’s Department of Environmental Quality (MDEQ or DEQ).  Governor Whitmer renamed this agency as the office of Environment, Great Lakes and Energy (EGLE) in 2019. The DEQ’s Office of Drinking Water and Municipal Assistance (OWDMA, also since renamed) is the primary subdivision represented in this archive, and is charged with overseeing localities’ water quality and water source plans, permitting for construction, approving and allocating federal loans for water systems, and enforcing water quality regulations. The MDEQ has “primacy” over Michigan’s water quality enforcement, granted to this agency by the EPA.
  • The Department of Treasury (Treasury), the entity responsible for implementing emergency management and supervising Flint’s emergency managers, as well as general oversight and approval for various funding flows to cities.
  • The Michigan Department of Health and Human Services (MDHHS), the department responsible for public health and healthcare services programs. MDHHS was created in April 2015 when Executive Order 2015-4 formed a merger between the Department of Community Health (DCH) and the Department of Human Services (DHS). For this reason, some emails and sender/ receiver information in the dataset refer to DCH and MDHHS interchangeably. 

Although most of the emails originate with or were sent to one of these departments, the overall communication network includes many other departments, consultants, officials, and other entities, for a total of 933 unique senders and receivers in this sample. Our email database does not include PDFs from the following departments: 

  • Department of Licensing and Regulatory Affairs (LARA), 5 files
  • Department of Natural Resources (DNR), 3 files
  • Michigan State Police (MSP), 12 files
  • Michigan Department of Agriculture and Rural Development (MDARD), 1 file
  • Department of Corrections (DOC), 2 files
  • Department of Transportation, Management and Budget (DTMB), 35 files
  • Kevyn Orr and DWSD, 1 file
  • Talent and Economic Development (TED), 1 file
  • Talent Investment Agency (TIA), 1 file
  • Michigan Civil Service Commission (MCSC), 1 file

We did not include these PDFs in the first searchable database for a variety of reasons. Most of these departments became involved in the water crisis in or after September/October 2015, and were responsible for various elements of crisis response, especially bottled water and filter distribution, as opposed to earlier decisionmaking in the creation and coverup of the water disaster. Many of these PDFs are also repetitive (e.g. daily or weekly newsletters with only minimal relevance to the crisis, and sent to hundreds of individuals.) We have processed these documents through Optical Character Recognition (OCR) and split them at the “bookmark” level, our proxy for email threads, and will make them available for review and use in the future. 

A Digital Humanities Approach to Building the Email Archive

In our discussion of the mechanics of database construction [hyperlink], we compared this project to other computer science projects involving an email corpus. Most of these archives are categorically different from our project. Because the purpose within computer science is to develop and practice methods for email analysis, scientists can select email datasets in the simplest possible presentation. In contrast, our goal to work to understand and better explain the decisionmaking behind the Flint Water Crisis, necessitates working with a specific email dataset in the presentation we could access, not the “best possible” version. The extra data-cleaning steps we had to undergo to create this database make this project more suited for the digital humanities.

The digital humanities approach highlights slow processing work of data cleaning, which is often taken for granted or rendered invisible–which is, of course, true for cleaning even outside the data context. (That approach also governs our decision to make our code available for others to use and to be transparent about our process.) Digital humanities methodology emphasizes that data cleaning and analysis can and should take place simultaneously, rather than segregating these steps. While members of our team have tended to focus either on back-stage technical aspects or on content analysis, we integrate these stages and continually share findings across groups. Ultimately, the slow pace of data cleaning has become a strength of the project, allowing us to build up a wide knowledge base of the underlying issues and of the underlying data. The email database evolved alongside this research, facilitating more thorough searches as research questions and accompanying search terms became more directed. If the whole database had been immediately accessible for search, that would have sped up analysis in some ways, but inhibited our depth of understanding in other ways, since we would not have known what to search for.

Many of the insights we have developed into the archive’s contents, both structure and substance, have come from fine-grained approaches and the slow work of data extraction and organization. From some of our first forays into the data, it became clear that this communication network, and probably many such networks, had multiple components that would not become clear through tools like topic modeling or even sentiment analysis. 

This process has allowed us to distinguish different potential “patterns” of missingness in the absences from this dataset. A quantitative count of email frequency provides one way to look at missingness. For instance, the diagram below shows a sharp drop-off in emails around May-July 2015, a key period for state activity around the crisis. This pattern indicates a possibility that relevant emails were either not returned through FOIA, or that government officials were using some other means of communication for this period.

 

Patterns in daily email frequency in 2015 from four government entities (Department of Environmental Quality, Department of Health and Human Services, Treasury, and the governor’s Executive Office). The mid-2015 decrease in email frequency is especially pronounced for emails coming from Treasury (shown in purple) and the Executive Office (blue). 

Our qualitative analysis allows attention to other aspects of the withholding of information, patterns of “taking it offline” described above, political “non-decisionmaking” (a dynamic where decisions take place without much or any official discussion) and related forms of evasion. Close reading sections of this content, and related documents not in the archive also helps us understand information (from specific official documents to key decisions) that does not appear in this archive, to the best of our knowledge. Identifying these patterns of absence has been generative, pointing to the potential significance of the “missing” information. We are still working on ways to represent this complexity and differentiate the different types of absence. This serves as one example of how our multimodal engagement with the data allows us to keep all the above in mind as we work. (Users of the archive are encouraged to keep this understanding in mind as well.)

A Note on Content

It is also important to note that this archive may elicit strong emotions. This archive documents the buildup and fallout of a massive trauma. Readers can see the background conversations of officials and others responsible for Flint’s safety. In many instances, they show a profound lack of consideration and minimization of residents’ concerns. There are also moments when people do discuss the implications of Flint’s water problems, and then decide to stay the course regardless. The record is often troubling and painful to read. It also represents important testimony about how these events happened. Our ongoing work to explain the archive’s contents seeks to put these communications in conversation with community experiences and to demonstrate the broader significance of these discussions.