A conceptual architecture for contractual data sharing in a decentralised environment
Machine Learning systems rely on data for training, input and ongoing feedback and validation. Data in the field can come from varied sources, often anonymous or unknown to the ultimate users of the data. Whenever data is sourced and used, its consumers need assurance that the data accuracy is as described, that the data has been obtained legitimately, and they need to understand the terms under which the data is made available so that they can honour them. Similarly, suppliers of data require assurances that their data is being used legitimately by authorised parties, in accordance with their terms, and that usage is appropriately recompensed. Furthermore, both parties may want to agree on a specific set of quality of service (QoS) metrics, which can be used to negotiate service quality based on cost, and then receive affirmation that data is being supplied within those agreed QoS levels. Here we present a conceptual architecture which enables data sharing agreements to be encoded and computationally enforced, remuneration to be made when required, and a trusted audit trail to be produced for later analysis or reproduction of the environment. Our architecture uses blockchainbased distributed ledger technology, which can facilitate transactions in situations where parties do not have an established trust relationship or centralised command and control structures. We explore techniques to promote faith in the accuracy of the supplied data, and to let data users determine trade-offs between data quality and cost. Our system is exemplified through consideration of a case study using multiple data sources from different parties to monitor traffic levels in urban locations.