BLOG : POST

Data and the View From Nowhere

Photo Credit: Soham Banerjee

The View from Nowhere describes a kind of false objectivity or feigned neutrality in reporting. Journalists have sometimes used the View from Nowhere, a term popularized by NYU journalism professor Jay Rosen, as a hedge against accusations of bias.

Waves of technological and social change have worn down the idea that news can be reported with pure objectivity, without a perspective. It’s not just postmodern thought and political polarization but also mass amateurization. With the ability to shoot photos and record videos anywhere, anyone with a smartphone can commit acts of journalism from their own point of view.

More and more professional journalists seem to be taking up Rosen’s call for transparency as one way to assert the authority of their work: publish your sources, explain your methods, acknowledge your mistakes.

Meanwhile, Excel is catching up with the reporter’s notebook as a defining tool of the trade. Journalists have always relied on data to some degree, but never before has it been so readily available and easy to process and analyze (thanks especially to software like R, developed by Hadley Wickham, and Mike Bostock’s d3). Journalism schools now offer data journalism degrees, and data-driven stories have become de rigueur at news organizations small and large.

Data-driven reporting can give the impression that journalists aren’t simply standing in the middle of the road between two opposing viewpoints. The numbers can’t lie…or can they? Overconfidence and blind trust in the power of data can undermine the trust readers put in our work.

There are two likely points of failure to watch out for: how the data is initially gathered, and errors made on the way to visualization. The former is an error in judgment or misplaced trust, and the latter is in translation.

Even when journalists make an effort to vet their sources of data, they can be misled, just as with any other source. Nathan Yau at FlowingData, put three examples of this fallibility on display in his recent post “When data is not quite what it seems.” He shows that where the underlying data is incomplete or made from faulty assumptions, the conclusions and graphics drawn from them will be flawed too.

One of the examples Yau cites comes from FiveThirtyEight, who have been early champions of data-driven reporting. Reporters there had made assumptions on a data set about broadband coverage that purported to be more comprehensive and authoritative than it turned out to be. “Just because a data set comes from reputable institutions doesn’t necessarily mean it’s reliable,” Claire Malone and Mai Nguyen wrote in their mea culpa.

If you work with data regularly, you know that this happens quite a lot — primary data sets can have lots of holes, or aren’t structured conveniently for use, and in order for them to be usable, you often have to either transpose or patch the holes with whatever other data you can find (or buy), or extrapolate in unsatisfying ways.

Whether it’s a handwritten list of shipping container numbers, the heart-rate monitor on a patient’s finger, or a swab of DNA from a crime scene, data is a result of human beings making a series of judgment calls. The first is someone deciding to collect the information at all — or to prevent its collection, as in the case of gun violence statistics.

Then people decide how that signal gets captured and quantified, and interpreted. With huge data sets or streams, programmers encode algorithms to translate vague requests from product managers, filtering out noise and processing it with the intent of making the signal selectively clearer and brighter. (Facebook, for example, recently admitted that its metric for how long people watch video there overestimated average viewing times, misleading publishers and advertisers.) The hard part is checking to see if the final product accurately reflects the original data, after all the transformations it has gone through.

All data, in other words, comes from somewhere. It does not descend magically from the Cloud, pristine and untouched by human hands, into our computers. Data is often dirty, questionable, and stitched together from different places and world views. When presented, however, numbers and data visualizations can have the effect of neutralizing any other possible representation of reality than the one embedded in the data.

The onus is on journalists (and editors, developers, and designers) to understand this fundamental nature of the material they rely on — that data can have an encoded point of view — so that it does not enable a new View from Nowhere.