Post: 22 October 2015
Yesterday Matt Hancock MP, Minister for the Cabinet Office, told Parliament that over the past five years his department has “opened up 20,000 Government data sets to the public”.
Let’s factcheck that a little …
Cabinet Office’s usual point of reference for the number of available public datasets is the data catalogue on Data.gov.uk. For example:
* 40,000 in 2012? Yeah, I dunno. Maude was fun, wasn’t he?
As of yesterday Data.gov.uk listed 20,769 “published” datasets, of which 14,992 were “open”.
Five years ago (on 21 October 2010) there were 4,247 published datasets listed on Data.gov.uk, all of which were open.
So over the past five years Data.gov.uk has catalogued a net increase of 16,522 published datasets, or a net increase of 10,745 open datasets.
But let’s give Mr Hancock the benefit of the doubt. We will include all the published datasets catalogued on Data.gov.uk since it’s public launch (under Labour) in early 2010.
And perhaps by “opened up” Mr Hancock doesn’t actually mean released to the public as open data. We’ll make the broadest possible interpretation and say that “opened up” simply means Cabinet Office has highlighted the existence of 20,000 or so public datasets that are available under some licence or other.
Over 6,000 of the published datasets on Data.gov.uk belong to local authorities. Do those count as big G “Government” datasets? Some were published to comply with DCLG’s Transparency Code, but most council data programmes are the product of local initiatives rather than central government policy.
Nearly 4,000 of the published dataset on Data.gov.uk are UKHO bathymetric surveys. Valuable data, but isn’t this padding the numbers? Depending on your definition of “dataset” we could just as easily count these as a geographically segmented time series of a single dataset.
More crucially, there’s no record on Data.gov.uk of when datasets were first published. The majority are listed after publication; sometimes years after. For example Data.gov.uk lists 1,000 published datasets that belong to the Office of National Statistics. Many of those will have been available via ONS’s website long before they were listed on Data.gov.uk. Has the Cabinet Office “opened up” the 2001 Census, or the 2004 IMD? That would be stretching the point too far.
Here’s what the Public Administration Select Committee (PASC) had to say on this subject in a 2014 report on statistics and open data:
It is often pointed out that more than 13,000 datasets can now be found on data.gov.uk, but it is unclear how many of these represent simple republishing of data already published on other government sites. Some data sets are small and others large. And it is possible for departments to get more data out by publishing it in smaller bundles or updating it more frequently, in such a way that there is little or no extra public benefit. In these circumstances, measuring progress on this important agenda is difficult if not impossible.
PASC invited the Government to publish a clear list of open data, indicating when each data series became open in each case. Cabinet Office brushed off that suggestion.
Absent that additional information, Mr Hancock’s statement to Parliament yesterday is highly dubious.
You can verify most of the figures in this post yourself: https://data.gov.uk/data/dumps/.
Image credit: Hack Day photo of Matt Hancock by Cabinet Office (CC BY-NC 2.0)