Linux Foundation debuts Community Data License Agreement

New open data licenses make it easier to access, share and use data to power artificial intelligence and machine learning applications across critical fields.

Tuesday, 24th October 2017 Posted 8 years ago in by Phil Alsop

The Linux Foundation has introduced the Community Data License Agreement (CDLA). The CDLA licenses are a collaborative effort to address the rights and ambiguities around sharing “open” data. Licenses that let organizations share data as easily as they share open source software code can help people take full advantage of the vast amounts of data, now measured in petabytes, to power new applications that promise to enhance safety and services.

Demand for shared data has grown due to machine learning, artificial intelligence (AI), blockchain and geolocation technologies. The CDLA licenses can help organizations open up and share data, with the goal of creating communities that curate and share data openly.

For instance, if automakers, suppliers and civil infrastructure services can share data, they may be able to improve safety, decrease energy consumption and improve predictive maintenance. Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly. Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.

Similarly, climate modeling can integrate measurements captured by government agencies with simulation data from other organizations and then use machine learning systems to look for patterns in the information. It’s estimated that a single model can yield a petabyte of data, a volume that challenges standard computer algorithms, but is useful for machine learning systems.

“An open data license is essential for the frictionless sharing of the data that powers critical technologies,” said Jim Zemlin, Executive Director of The Linux Foundation. “The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good. The CDLA is a step in that direction and will encourage the continued growth of applications and infrastructure.”

CDLA Licenses Promote Sharing While Reducing Risk

The Linux Foundation, in collaboration with a broad set of participating organizations, drafted the CDLA licenses with the needs of companies, organizations and communities that have valuable data assets such as these to share. The intention of the licenses is for contributors and consumers of open datasets to actively use and support the contribution of data in a uniform fashion, while clarifying the terms and reducing risk.

There are two CDLA licenses: a Sharing license that encourages contributions of data back to the data community, and a Permissive license that puts no additional sharing requirements on recipients or contributors of open data. A few commercial and community implications of the licenses include:

? Data producers can share with greater clarity about what recipients may do with it. Data producers can also choose between Sharing and Permissive licenses and select the model that best aligns with their interests. In either case, data producers should enjoy the clarity of recognized terms and disclaimers of liabilities and warranties.

? Data communities can standardize on a license or set of licenses that provide the ability to share data on known, equal terms that balance the needs of data producers and data users. Data communities have a high degree of flexibility to add their own governance and requirements for curating data as a community, particularly around areas such as personally identifiable information.

? Data users who are looking for datasets to help kick off training an AI system or for any other use will have the ability to find data shared under a known license model with terms that clarify their rights and responsibilities.

The CDLA is data privacy agnostic and relies on the publisher and curators of the data to create their own governance structure around what data they curate and how. Each producer or curator of data will have to work through various jurisdictional requirements and legal issues.

Linux Foundation debuts Community Data License Agreement

New open data licenses make it easier to access, share and use data to power artificial intelligence and machine learning applications across critical fields.

SLE Micro 5.1 adds edge-focused security features

Containers excitement

Ubuntu 21.10 has landed

Majority of enterprises still to 'cross the chasm' to full Kubernetes and Cloud Native adoption

Kubernetes complexity is slowing down adoption of containers in almost half of organisations

Confluent launches Confluent for Kubernetes

Red Hat and IBM Research launch the Konveyor Project

Low-code to the rescue?