Tomorrow at 11:00 am I'm participating in the open data panel at the DC Week conference, along with open data specialists from the Sunlight Foundation, USAID, Maryland's state government, and Phase2 Technology. While I'm looking forward to an interesting discussion about the merits and challenges of the free exchange of data for governments, NGOs, and citizens, what's most interesting to me is the answer to this question:
What's needed for organizations to be open by default?
Two years ago Vivek Kundra, then CIO of the United States, called for an Open by Default policy urging the disclosure of all U.S. government data, excepting only data that is sensitive for privacy and security reasons. This thinking is steadily gaining ground and is already the existing policy at the World Bank. More organizations are realizing the benefits of opening up data to enable communities of problem solvers, to alert the public to important issues, to foster transparency, to streamline communication within organizations, and possibly most interesting in the current economy to cut costs involved in collecting, analyzing, and providing data.
Open data creates so much opportunity because of the low transaction cost in providing it: the internet makes it cheap to host information online. Publishing data in structured formats, free of charge, and in open licenses removes any expensive approval processes for data reuse. This enables an incredibly liberal disclosure policy (exactly the one that Kundra is calling for) allowing data providers to just put out their raw goods and still justify the cost of a long tail if only a few data sets among many generate any real benefits.
But still, as anybody in the business of publishing data will testify, the remaining costs are significant. Being deeply involved in helping clients set up open data workflows, I have experienced this pain many times myself. The everyday tools we use for managing data (Microsoft Excel, Adobe Acrobat, shared hard drives, email, and the like) are simply not up to the job. They are too beholden to a world in which paper was the medium, copies were expensive, and requests were handled by humans. For data to be open, the common minimum requirement is to use a structured, non-proprietary format, cleanly referencable by URLs. Alas, usually this is not how data is managed internally. This practice of doing business in two different ways introduces a conversion and maintenance process for publishing that might not be expensive, but is significant and to a large extent simply unnecessary.
Not only should the policies for data be open by default, but our tools should open data by default. The software we use should not hamper openness and sharing data, it should facilitate it.
To open data in an efficient way we need to further reduce transaction costs by converging how we share data within an organization and with the outside world. Open by default has profound consequences for the data management processes and tools we set up in organizations. For future investments in information technology, two questions should be asked: "how can internal infrastructure facilitate being open?" and "how can external data services be better leveraged internally?"
I'm looking forward to talking about this, more specifically about open standards, and about how open source tools can facilitate openness tomorrow at the Open Source and Open Data panel at DC Week. Hope to see you there.