By Eric Vandenbroeck and co-workers
How To Deal With Classified Documents
In August 2016, the United States suffered one of history's most cataclysmic leaks of classified information. An anonymous entity calling itself “the Shadow Brokers” exposed an arsenal of cyberweapons that the National Security Agency had developed—in great secrecy. The intelligence community sprang into damage-control mode. Because the NSA’s hackers rely on plausible deniability, disclosing such clandestine tools and their connection to the U.S. government meant that the agency would be forced to devise new ones. But there was also a more pressing danger: with the source code for these powerful weapons now published on the Internet, any unscrupulous actor could deploy them. It was the digital equivalent of “loose nukes.”
Practically overnight, cybercriminals repurposed the NSA’s proprietary exploits to launch audacious ransomware attacks, ultimately shutting down millions of computers worldwide and paralyzing thousands of private businesses, from an auto plant in France to a chocolate factory in Australia. Foreign governments took advantage of the tools, as well. North Korea used the NSA’s malicious code to attack the British healthcare system, forcing hospitals to turn away patients. Iran used it to target airlines in the Middle East. Russia used it against Ukraine.
Even as these cyber-assaults proliferated, officials in Washington had no idea who was responsible for the breach. They did not know whether it was a foreign intelligence service that had compromised the NSA’s vaunted digital defenses or some disillusioned agency coder gone rogue. As if to compound the government’s humiliation and alarm, the Shadow Brokers taunted the agency in a series of online posts, mocking the investigation in playfully broken English: “Is NSA chasing shadows?”
In 2017, The New York Times reported that after 15 months of investigation, authorities were no closer to an answer. If they have since managed to identify the perpetrator, then that, too, remains classified. But the whole debacle highlights the subtle Achilles’ heel of government classification. The NSA is famously secretive; as the old joke has it, its initials stand for “no such agency.” Yet here was a massive leak in which some of the nation’s most closely guarded secrets were spilled out for the world to see. Nor was this the only recent jumbo leak of highly classified material: there was the 2017 leak of CIA hacking tools by an agency software engineer, Joshua Schulte; the 2013 leak of surveillance programs by an NSA contractor, Edward Snowden; and the 2010 leak of cables and videos by an army private, Chelsea Manning.
This, as Matthew Connelly lays bare in his new book, The Declassification Engine, is the paradox of contemporary government secrecy. For decades, blue-ribbon panels and incoming presidents have observed with surprising unanimity that overclassification has grown out of control—and vowed to fix it. Yet every year, more new documents are marked “top secret,” and more realms of official activity are placed beyond the scrutiny of citizens, journalists, and even Congress. In 2017, the federal government spent over $18 billion maintaining this classification system, almost double what it spent five years earlier. But precisely because so much government work now transpires behind a veil of secrecy, it is necessary to grant clearances to an ever-larger cadre of federal employees. Some 1.3 million Americans now hold top-secret clearances, roughly double the population of the District of Columbia.
The math becomes simple. Combine the vast dimensions of the classified world with the enormous numbers of people who need access to it to do their jobs and factor in the increasing ease of copying and transferring enormous volumes of digital information. It seems almost certain that wholesale leaks of classified data will continue. Decades of bad habits practiced by government agencies hooked on classification undermine transparency and democratic accountability, and this impulse to classify indiscriminately is often justified by invoking national security. But as Connelly points out, when everything is secret, nothing is secret: the “very size of this dark state . . . has become its own security risk.”
If the dangers of excessive government secrecy are so widely acknowledged, why has nothing been done about it? Connelly suggests that the authority to classify has become a cherished prerogative of government power—a tool used by presidents, generals, and various chieftains of lesser fiefs to enshroud their decisions in mystery and ward off scrutiny or accountability. Reform efforts founder in the face of bureaucratic recalcitrance. But another challenge is the sheer volume of restricted documents: because the government classifies more quickly than it declassifies, the amount keeps growing yearly. How do you begin to declassify all this information, and if you cannot, what becomes of the historical record? In his book, Connelly proposes what might be an inspired solution—but only if the government takes him up on it.
Open And Shut
Connelly is a historian at Columbia University, where he runs the History Lab, a group that focuses on applying data science tools to the problem of overclassification. When one considers the full sweep of American history, he argues, widespread classification is not just a betrayal of the United States’ founding principles but also a relatively recent anomaly. The first century and a half of the republic were characterized by “radical transparency,” Connelly contends: when the nation was at war, it engaged in espionage and secrecy, but during peacetime, these practices receded. The United States had no permanent intelligence agency until the Office of Naval Intelligence was created in 1882. As late as 1912, Woodrow Wilson could remark, while campaigning for president, “There ought to be no place where anything can be done that everybody does not know about.”
Connelly demonstrates the degree to which this ideal of accountability was explicitly linked to a tradition of record-keeping and publicly accessible archives. In 1853, long before President Donald Trump took to flushing official papers down a White House toilet, it was declared a felony to destroy any federal records. A century and a half before WikiLeaks published purloined State Department cables, the department began publishing such records, voluntarily disclosing volumes of letters recently received through embassies abroad. In one poignant anecdote, Connelly recounts that when construction began on the Pentagon in 1941, President Franklin Roosevelt anticipated that the postwar military establishment would be too small to fill it—and would vacate the building when the fighting stopped so that it could be repurposed as an annex to the National Archives.
It did not pan out that way. Indeed, the rise of the permanent defense bureaucracy and the military-industrial complex in the immediate aftermath of World War II gave birth to the juggernaut of official classification. Rather than roll back the culture and institutions of secrecy that had prevailed during wartime, the Truman administration institutionalized them with the advent of the Cold War. The creation of the CIA and other intelligence agencies and the secrecy surrounding the United States’ growing nuclear arsenal accelerated the professionalization of the classified state. “Our present security system is a phenomenon of only the past decade,” Senator Hubert Humphrey remarked in 1955. “We have enacted espionage laws and tightened existing laws; we have required investigation and clearance of millions of our citizens; we have classified information and locked it in safes. . . .We have not paused in our necessary, though the frantic, quest for security to ask ourselves: What are we trying to protect, and against what?”
In theory, the passage of time should enable Americans to look back at the ostensible rationale offered for classifying various government activities and determine, in retrospect, whether all that secrecy was justified. Connelly and his fellow scholars at Columbia are engaged in this sort of enterprise. But such a project is frustrated in practice by the slow pace of declassification. Reams of important historical documents remain classified more than half a century after the described events. Even as the government spends more money classifying more documents each year, funding for declassification efforts has steadily eroded. The federal government now budgets only about $100 million annually. As Connelly dryly notes, “The Pentagon spends four times that just on military bands.”
But Connelly and his colleagues have developed an innovative solution, studying the records the government has unsealed to see what they reveal about the dynamics of official secrecy. Over the last decade, his researchers have assembled the world’s largest database of declassified documents. Drawing on big data and machine learning tools, they have developed a series of techniques to analyze this archive for patterns and anomalies. When Connelly suggests that in some corners of the federal bureaucracy, the devotion to secrecy has evolved from a culture into “a cult,” it might seem hyperbolic. But consider that when he undertook this academic project—scanning the redactions in declassified documents in search of lessons about the pathologies of overclassification—the project was perceived to be sufficiently threatening that former government lawyers advised him and his team that they could be accused of violating the Espionage Act.
It should be no surprise that the gatekeepers of the classified world might feel defensive about such an inquiry. Even the staunchest critics of overclassification generally acknowledge that the government must maintain some secrets. Reasonable people can disagree about whether the NSA should be developing an arsenal of cyberweapons. Still, most observers would concede that such an arsenal, if it exists, should not be freely accessible to the public. The same goes for sensitive details associated with nuclear weapons or the names of people spying for the United States. (In the case of covert assets’ identities, there are compelling grounds for maintaining such secrets even decades after the conduct in question since prospective spies abroad will be less likely to betray their countries if they believe that the details of their betrayals may be automatically declassified a mere 20 years later.)
If official classification had been carefully confined to these sorts of tailored categories, it would never have blossomed into such a rampant problem. But the basis for most classifications is less coherent. At some point early in that postwar expansion of government secrecy, the authority to mark something classified gave rise to a bureaucratic reflex. For any government officer making a quick decision during a busy workday, the penalties for under classifying are quite salient, whereas penalties for overclassifying do not exist. One way of accounting for how the nation got to this juncture is to look at the incentive structure for that officer deciding whether to classify a single document and extrapolate outward to all the other functionaries invested with the power to deem something “secret” in all the other agencies every day of every year over the last eight decades. The problem has assumed proportions that can be difficult to comprehend. In a single year, 2012, U.S. officials classified information more than 95 million times, or roughly three times per second.
But that version of the story—in which genuine national security imperatives merged with bureaucratic path dependence and risk aversion and snowballed—is the benign interpretation. For Connelly, who has scrutinized actual classification decisions made over those eight decades, the real explanation points to something more pernicious. Classification is an exertion of power, he argues. As such, it has often been motivated not by the dictates of national security but by considerations of raw political or bureaucratic leverage.
“It turns out that, from the very beginning, what’s secret has been whatever serves the interests of the president and all those around him who are invested in executive power,” he writes. In any bureaucracy, the ability to render something secret becomes an irresistible trump card—a way to evade oversight, tout parochial priorities, and obscure shortcomings. “After conjuring the power of secrecy and setting it loose, presidents found that it had a power all its own,” Connelly continues. “Thousands more people, many career civil servants, began creating their secrets and jealously protecting them, making it harder to identify and protect what mattered to the president personally. At the same time, they could leak whatever they liked, undermining the president’s ability to manage the news cycle.” Connelly is particularly scathing about the role of military leaders, such as Douglas MacArthur and Curtis LeMay, who “employed leaks and spin no less than secrecy to protect their perquisites and push their agendas,” lobbying to expand military spending and outright defying civilian authority. In 1978, he notes, the Joint Chiefs of Staff stopped preserving notes from their meetings, “as if America’s most senior military leadership were running a numbers racket, committing nothing to paper.”
In a system where so much information ends up classified, selective leaking is a safety valve for when certain matters of national importance need to get out. The legal scholar David Pozen has argued that the “leakiness” of the executive branch is not a sign of institutional failure but, on the contrary, a strategic adaptation to prevailing realities, one that enables an administrator to send “messages about its activities to various domestic and international audiences without incurring the full diplomatic, legal, or political risks that official acknowledgment may entail.” As William Daley, President Barack Obama’s chief of staff, once admitted, “I’m all for leaking when it's organized.”
Every White House has regularly leaked sensitive and often classified information to the press. Whereas penalties for rank-and-file employees who make unauthorized disclosures are often severe, consequences for deliberate leaks by highly placed officials are practically unheard of. Consider the contrast between Reality Winner, the NSA contractor who leaked an intelligence report about Russia’s interference in the 2016 election, and David Petraeus, the CIA director and four-star general who shared several notebooks full of highly classified information with his biographer (who was also his mistress) and then lied to federal investigators about it. A winner was sentenced to five years and three months in prison; Petraeus received two years probation and a fine. Connelly invokes a quip by Sir Humphrey Appleby of the BBC sitcom Yes Minister: “The Official Secrets Act is not to protect secrets. It is to protect officials.”
Locked In The Archives
What is maddening about the lack of progress on overclassification is that anybody who has given the issue serious consideration would likely agree with the broad contours of Connelly’s arguments. Nearly two decades have elapsed since the 9/11 Commission concluded that too much classification could jeopardize national security. “Secrecy, while necessary, can also harm oversight,” the report argued, adding that the “best oversight mechanism” in a democracy is “public disclosure.” But it is one thing to acknowledge the problem and quite another to do something meaningful about it. Obama came into office vowing to create “the most open and transparent administration in history.” Yet, in the end, as Connelly points out, “he presided over exponential growth in classified information.” (He also initiated more criminal prosecutions of leakers than all his predecessors combined.) When outside groups have tried to pressure the federal government into greater transparency, they have aroused staunch resistance and occasionally retaliation. Connelly relates one galling story: in the 1980s, after the National Security Archive, a nonprofit group affiliated with George Washington University, filed Freedom of Information Act requests and initiated lawsuits to uncover abuses of government power by the Reagan administration and the FBI, the FBI responded by placing the National Security Archive itself under surveillance.
Meanwhile, the daunting tonnage of classified documents has compounded every year. Even those who earnestly want to do something about the problem fear that it may simply have become unmanageable. By one estimate, it will take 250 years at the government’s current processing rate to respond to the backlog of Freedom of Information Act requests at the George W. Bush Library alone. No effective system exists to automate declassification, and the relevant federal agencies lack the personnel and resources to review and redact billions of classified documents manually. “If instead these records were withheld indefinitely or destroyed, it would be impossible to reconstruct what officials did under the cloak of secrecy,” Connelly points out. Thus, a problem that on its face might seem like a dry technocratic riddle—with billions of new classified documents generated every year and no scalable method for safe and reliable declassification, what happens to the historical record?—assumes an existential urgency. If the U.S. government is “not even accountable in the court of history,” Connelly writes, “it truly is accountable to no one.”
As it happens, Connelly has a solution. Because the aggregate volume of still classified information is so overwhelming, the only way to tackle it will be to employ the wizardry of big data. By scanning hundreds of thousands of declassified documents (some still redacted, others not), Connelly and his colleagues could search for specific words, themes, and connections to identify areas of particular sensitivity. Comparing redacted and unredacted versions of the same declassified documents from a given period, they compiled a jokey “America’s Most Redacted” list of names most frequently blacked out (including Congolese Prime Minister Patrice Lumumba and Iranian Prime Minister Mohammad Mosaddeq, both targets of CIA operations). They devised a series of technological methods to rapidly sort through extensive archives and select documents that met certain criteria. Suppose such techniques were harnessed for the declassification effort. In that case, they realized, it might be possible “to train algorithms to look for sensitive records requiring the closest scrutiny and accelerate the release of everything else.” This is the “declassification engine” of the book’s title: an ingenious technical solution to an impossible bureaucratic problem.
For the moment, the machine is still in its infancy, with a beta version concocted by the History Lab at Columbia as proof of concept. To date, it has only worked with material that has already been declassified. But Connelly and his team wanted to improve its capability and accuracy by pilot testing it on historical classified information, and for that, they needed government buy-in. This would not be difficult to obtain. After all, the federal government has paid much lip service to the idea that overclassification has reached crisis proportions. Here was a way of solving it that would be cost-effective, especially compared to engaging human reviewers to manually process old classified material before releasing it to the public.
So Connolly and his band of data scientists and mathematicians went to Washington to plead their case. They met with the State Department, the National Declassification Center, the CIA, the Public Interest Declassification Board, and the Office of the Director of National Intelligence. There was certainly interest. At the State Department, which produces more than two billion emails yearly, one official informed them that the need for the technology they were offering was “frighteningly clear.” But the department had no money to authorize a pilot program or fund their research. Someone suggested Columbia students could be enlisted to work on the initiative and paid in course credit. “I was struck by the notion that declassification could be treated as a kind of school project,” Connelly writes.
His group ended up in a meeting at the Intelligence Advanced Research Projects Activity, which has been delegated to work with the National Archives to explore technological solutions to the problem of overclassification. After listening to their pitch, an IARPA official told the visitors that she had been trying for years to build a similar engine—not to declassify, but to classify. She found their ideas intriguing but explained that making technology to help review and release classified documents would represent an “insufficient return on investment.”
It is a dispiriting coda to Connelly’s fascinating and urgent book. One hopes that he and his colleagues will ultimately find other, more hospitable points of entry in the federal government that would allow them to test and improve their declassification algorithms with actual classified raw data. If you believe in the founding principles of the American form of government, then the stakes could scarcely be higher. Connelly recalls thinking after being shown the door at IARPA, “We cannot assign a dollar value to democratic accountability.”