Post: 12 January 2017
Earlier this week Computer Weekly published an article by Jonathan Stoneman in which he questions the extent to which UK open data is being used.
Stoneman is generally supportive of the open data agenda. However in his article he presents some figures indicating that 31% of datasets listed on Data.gov.uk (DGU), the UK Government‘s flagship open data website, have never been downloaded.
He also seems to question the success of last year’s Open Defra initiative, setting Defra’s claim to have opened up 13,000 datasets against DGU site analytics that appear to show only 669 Defra datasets have actually been downloaded.
If we accept for the sake of argument that DGU statistics are a relevant metric for re-use of UK open data, the key figures in Stoneman’s article paint a pretty dismal picture.
Fortunately, Stoneman seems to have misunderstood the DGU statistics. Most of the figures from DGU used in the article have been misread or presented without enough context.
I’ve downloaded DGU’s usage statistics myself, and analysed them alongside bulk data that provides more detail about individual dataset records in the DGU catalogue. Following is commentary on statements made in Stoneman’s article.
Number of publishers
Some 1,401 government departments, including local government and agencies, are listed as ‘publishers’.
This is technically correct. However DGU’s publisher list is a taxonomy. Only 1,100 or so of those publishers actually list any data on the site. The top five publishers account for more than half the open datasets in the catalogue. The median average number of datasets per publisher is two, which rather undermines the argument that DGU represents a surfeit
of open data.
Total downloads
Two million datasets were downloaded in 2016.
This is incorrect. DGU analytics show more than two million “downloads” since early December 2012. The equivalent figure for 2016 was slightly over 500,000.
It’s worth noting that a “download” in DGU usage statistics just means a user has clicked a resource link on a catalogue page. That link may or may not resolve to a direct download of data. Many resource links are to landing pages on publishers’ own sites.
There is no way of establishing what proportion of users actually get their open data via DGU. A small number of significant open datasets are only discoverable on DGU. However most government departments with serious data sharing programmes maintain their own repositories. Many datasets are also available on GOV.UK.
To put the DGU statistics in some kind of context, Ordnance Survey claims it had a million downloads of open data from its own site between 2010 and 2015.
Datasets with nil downloads
11,481 – 31% of the whole collection – were not [downloaded], not even once.
This is highly misleading. The DGU statistics do list more than 11,000 datasets with nil downloads. However that total includes thousands of records for “unpublished” datasets, datasets that are not available under an open licence, and catalogue records that are defunct.
Users can hardly download open data that isn’t there.
DGU statistics actually indicate about 3,600 open datasets had nil downloads from 2012 to present, and that about 4,600 had nil downloads in 2016.
Most of those datasets have only been listed on Data.gov.uk in the past year, and a significant number are niche land classification or marine survey datasets that cover small geographic areas.
Defra datasets
With respect to the Open Defra initiative, Stoneman notes the department’s target to open 8,000 datasets by June 2016, and Defra permanent secretary Clare Moriarty’s statement that the department had opened 13,000 datasets, “some of them very large”.
Stoneman then says:
Site analytics suggest that just 669 Defra datasets have ever been downloaded, with the most frequently downloaded being those covering staff pay and organograms, and financial transactions over £25,000. Together, these two datasets account for one in six of all downloads of Defra’s published data.
This is a false comparison. The Open Defra initiative covered release of datasets across the Defra network, a group of member organisations that includes Environment Agency, Natural England, the Rural Payments Agency, APHA and Fera.
The department itself does not produce much significant open data. However many of the Defra network members are data-rich. DGU statistics show more than 13,000 “downloads” for EA’s LiDAR datasets, and thousands more for flood data and EA’s popular ESOS dataset.
Image credit: open data (scrabble) by Justin Grimes (CC BY-SA 2.0)