UK floods and the case for opening up Environment Agency flood data

Post: 18 February 2014

Back in early 2011 I was on a Defra-led working group that looked at the potential for better sharing of flood data, in the context of wider negotiations between the UK government and insurance industry.

One of the proposals was for open data release of NaFRA, the Environment Agency’s main national flood risk dataset. I’ve pulled out a page from Defra’s final report that captures the pros and cons of the debate.

Three years later the open data agenda has a higher profile and the tenor of that early debate now seems over-familiar. Yet on flood data the government’s position is broadly unchanged.

The Environment Agency’s core datasets are shared inside the public sector and available to business on commercial terms. However there has been no move to maximise their reuse by releasing them to the public as open data.

Although the EA do make some allowance for non-commercial use of their datasets free of charge, the licensing terms are restrictive and in practice preclude most types of use on the open web even for non-commercial purposes.

The first barrier to open release of flood data: money

The common Catch-22 of the open data movement is that datasets we would most like to unlock almost always have an asset value as a source of revenue for the data producer.

(Conversely, data producers are most willing to release niche dataset they cannot monetise themselves. This is why Data.gov.uk is full to the gunnels with administrative effluvia nobody knows what to do with.)

The Environment Agency’s data strategy is probably the most commercially minded in the UK public sector, outside the Public Data Group trading funds. Income from commercial information licensing is about £4m a year. The majority of that is generated from reuse of the EA’s core flood datasets by the insurance sector and in the environmental search market.

How should the EA be compensated if they give up that income? There are central funds available to support transitional costs of open data release. However according to the Cabinet Office those funds cannot be used to make up for loss of licensing revenue.

The pure economic argument goes like this. The EA are not entitled to compensation at all; businesses are not generally compensated when a change in government policy puts them at a commercial disadvantage. The datasets themselves have already been funded by the taxpayer. The EA have only been able to generate commercial revenue from the data by leveraging its monopoly control and creating an artificial scarcity in the market. It’s invidious for a public authority to shape the market to that extent, and government should have put a stop to this years ago.

But in the real world the Environment Agency provide a vital public service on a shoestring budget. It is difficult to make a high-minded formal argument for open data release if that compromises operations on the ground. Open data is important, but Dave Throup has a job to do.

Fortunately I think there’s a practical solution. £4m sounds like a lot of money. However it’s only about 1% of the EA’s total annual income from licensing activities such as environmental permitting and inland fishing.

As chief executive Paul Leinster noted in a recent interview, the EA have not raised charge levels for regulated industry for some years now. Provided ministers can agree to bend the normal rules against cross-subsidy of different EA responsibilities, the 1% shortfall could be recouped by applying a small average percentage increase to non-data licensing charges.

There is also potential for the EA to develop commercial services, including apps, on top of the data it produces and outside the scope of its public task. I know that some others in the open data movement will take exception to a public authority competing in that manner, but in my view it’s fair play as long as the playing field is level on access to the data itself.

Disruption of information markets

Open release of core flood data will affect the Environment Agency’s existing customer base for the data. Licensees will benefit from a reduction in input costs and the removal of administrative burdens that go along with commercial data contracts.

However some licensees, particularly value-added resellers, will find themselves in a more competitive environment. The elimination of fees and the more flexible nature of open data licences means fewer barriers to entry for start-ups and SMEs.

There are two or three firms that produce commercial flood models aimed at the UK insurance market, and they may take exception to open release of EA flood data. However it is unlikely such a move would actually be anti-competitive. The commercial flood models are specialised products and there is already a substantial price differential between them and the EA’s datasets.

The virtue of having existing markets for EA flood data is that it is easier to identify use cases for the data and anticipate the potential benefits of open data release.

For example in the environmental search market we can anticipate that lower input costs and increased competition will drive down conveyancing costs for new homeowners; a small saving for individuals, but significant across the housing market.

In the insurance market we are likely to see the development of more specialised flood-related data products and services. However the main benefits are likely to come from a step-change in the ability of individual underwriters, claims handlers, high-street brokers and customers to access and share flood information without going through a labourious procurement process.

More speculatively I would anticipate much wider availability of flood-related information on the web now that technology has caught up with the demands of presenting large spatial datasets without loss of complexity. For example it should be possible to build hyperlocal websites around catchment-sized sections of flood mapping, which could be annotated collaboratively as the focus for community interest in flood risk management.

Political will

Under the current government the most successful open data initiatives have been delivered in support of other policy objectives. This means that, despite the Cabinet Office’s attempts to roll out transparency principles across government, some departments have embraced the open data agenda more readily than others.

The government is keen to reduce the cost of delivering public services and root out waste in local government, so we have plenty of spending open data. Transport, education and health are also important hubs of activity, so we are well supplied with open data in those themes also.

The environment is, safe to say, less of a priority. Defra has a transparency panel with responsibility for open data, but it is barely a year old and has been slow to build momentum. There is a new Open Data Strategy; well-written, but long on process and short on specific commitments. In the normal course of events there is not much going on at Defra or the Environment Agency to which the government would wish to invite particular scrutiny in the form of open data.

Except, recently we’ve had a spot of bad weather …

The current crisis: is this finally the moment for open flood data?

At the end of last week (shortly after the levelling of Somerset and the detachment of the South West from the rest of England, and with the Home Counties mostly underwater) some folks from the tech community got together at Downing St and decided to put on an emergency hackathon.

It was called #floodhack and it was wildly popular.

The Environment Agency released some flood data under the Open Government Licence. For a limited time only, and not the core risk datasets, but still notable progress from an open data perspective.

I’m basically an analyst, not a coder. I think most of the economic and social benefit from releasing open data will emerge from the capacity to derive insights and make better decisions, whether in business or public policy, and from reducing information asymmetries in market relationships. The government’s preoccupation with start-ups and the app economy as the focus of open data policy has always seemed to me to be rather myopic.

But the key characteristic of open data is that, once open, it is open for everybody. Ultimately it doesn’t matter why the government releases data, as long as it gets released. If the puppy like enthusiasm of a couple of hundred developers is what it takes then that’s fine.

(It does concern me that this is the most effective model we have for unlocking data, though: all in a rush, based on an intervention from Downing St. That was basically how we got OS OpenData in 2010. Four years on is that still the way to get things done?)

Let’s follow the impulse. Now that there is a base of developers interested in flood data (whether from #floodhack or earlier events like env[:hack]), and more importantly engaged with the real-world problems associated with flood risk, we need to convince the government to feed more open data into the process.

What’s the government’s incentive? Besides the prospect of some shiny new apps, some positive publicity, and the longer-term potential to grow the established markets for reuse of flood data, it should be clear from the recent weather crisis that flood risk is not well understood either by the media or the public.

Pushing flood data out into the world will promote discussion and improve understanding, which is surely in line with the Environment Agency’s priorities. There will be misunderstandings and bad presentations of the data of course, but every one of those is an opportunity for expertise to assert itself in the interests of debate and education.

What do we need, specifically?

Here is a more-or-less comprehensive list of Environment Agency flood-related datasets that have been approved for commercial re-use, with links to metadata.

It’s important to make a distinction between the core flood risk datasets (NaFRA, the Flood Map, etc.) and the “live” datasets. The former group describes the usual modelled risk of flooding in different geographic locations, and the latter group describes individual flood events as they develop. The potential applications for those two groups of datasets, and the barriers to open data release, are somewhat different.

#Floodhack was orientated around emergency response and focused almost entirely on live flood event data. However the management of flood risk is an ongoing problem that cuts across society and the economy.

One of the weaknesses of flood management policy, from all the main political parties, has been the tendency to lose interest in flood risk once it is out of the headlines.

We need to sustain interest in the reuse of open data to help the public understand and participate in the management of flood risk in our communities, beyond the current crisis. That means unlocking the EA’s core flood risk datasets, not just river levels and other live feeds.

It’s not all about the Environment Agency

Flood data, and responsibilities for the production of that data, is a complex subject regardless of the licensing. Some of the EA datasets include third-party data (though for the core risk datasets at least this should not present an insurmountable barrier to open data release).

More significantly, many of the datasets cover both England and Wales. In April 2013 the EA’s responsibility for Wales was devolved to a new body, Natural Resources Wales. Although commercial licensing of datasets that cover both countries is still handled by the EA, a change in the licensing status for those datasets would have to be agreed by both bodies.

Flood data for Scotland and Northern Ireland is a separate issue. The main data publishers are the Scottish Environment Protection Agency (SEPA) and the Rivers Agency (NI) respectively.

For some types of flood risk, mapping and the maintenance of data is the responsibility of lead local flood authorities (usually councils) or water companies.

The Centre for Ecology & Hydrology also holds related data, including the National River Flow Archive and data from monitoring of aquifer levels. The British Geological Survey maintains an important dataset on susceptibility to groundwater flooding.

Mapping of the river network, which will be essential to many uses of flood data, is primarily the responsibility of Ordnance Survey. The OS MasterMap Networks - Water Layer product is currently in beta. That would be an ambitious target for open data release. However at minimum any credible package of open flood data would need to include a generalised spatial dataset with attributed centrelines for the river network.

Photo credit: Floods in Hull June 2007 by Maggie Hannan, CC BY 2.0