The Fair Principles of Data Management

In 2016, a group from academia, industry, funding agencies, and scholarly publishers produced a paper on how the infrastructure supporting the reuse of scholarly data could be improved, with the underlying objective of extracting the maximum benefit from research investment.

The paper outlined the FAIR guiding principles for scientific data management and stewardship: findability, accessibility, interoperability and reusability. The principles aim to bring clarity to the goals of good data management and stewardship, and to define simple guideposts to inform those who publish or preserve scholarly data.

One of the key benefits in increasing the availability of data is that it helps improve its accuracy. For instance, data from different studies may be combined to create large data sets for analysis by the scientific community. It will also be subject to greater levels of peer review thereby allowing any assumptions used in creating the data, or conclusions that are drawn from it, to be informedly challenged.

While an increase in the accuracy of data is undoubtedly a good thing for the scientific community, there is a question of whether improvements in the availability of data will inevitably lead to innovators losing control of their intellectual property.

What Are the FAIR Principles?
The original 2016 paper elaborates on what is meant by findability, accessibility, interoperability and reusability. However, one of the clearest interpretations has been set out by the Association of European Research Libraries (LIBER), an early endorser of the principles. According to LIBER:

findability requires that data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier;
accessibility requires that data and metadata are understandable to humans and machines. Data is deposited in a trusted repository;
interoperability requires that metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation; and
reusability requires that data and collections have clear usage licences and provide accurate information on provenance.

What is absolutely clear from the original 2016 paper is that the FAIR principles are intended to apply not only to ‘data’ in the conventional sense, but also to the algorithms, tools, and workflows that led to that data.

And What Is Their Impact?
The FAIR principles reflect the movement in recent times towards open science values. Many of the FAIR principles had already been adopted in the scientific community.

In the original 2016 paper, the following initiatives were highlighted as examples of systems in which at least some of the FAIR principles were already being implemented: Dataverse (an open-source data repository software), FAIRDOM (integrating the SEEK14 and openBIS15 platforms in a FAIR data and model management facility for systems biology), ISA16 (a metadata tracking framework to facilitate collection, curation, management and reuse of life science datasets), Open PHACTS (a data integration platform for information pertaining to drug discovery), wwPDB (an intensively-curated data archive about experimentally-determined 3D structures of proteins and nucleic acids) and UniProt (a comprehensive resource for protein sequence and annotation data).

Since the publication of the original 2016 paper, the term ‘FAIR’ has been gaining traction.

The European Commission is establishing the European Open Science Cloud (EOSC), an initiative to provide a digital infrastructure that brings computing and data storage capacity to scientists across the European Union. As part of this initiative, the Commission in 2018 published a report entitled Turning FAIR into reality. One of the suggestions for funding a FAIR data system is the introduction of a requirement that a certain percentage e.g. 5%, of funding be allocated towards managing and stewarding data.

If initiatives such as the EOSC are implemented, then it seems likely that the FAIR data principles will be adopted in the scientific community on a large scale. This will enable ‘old’ data to be innovatively reused, thereby maximising the value of the original research investment and paving the way for developments in fields in which large, accurate data sets are of paramount importance such as personalised medicine and diagnostics.

Compatibility
Patenting is compatible with FAIR principles. Many open science resources are open in the sense that they are non-discriminatory, i.e. anyone can access and use the information that they contain, but nonetheless require the user to take a licence. Patents and other intellectual property rights can be hugely useful in defining the framework of the licence.

For instance, the Biological Innovation for Open Society (BiOS) initiative makes technology covered by certain patents – e.g. the TransBacter™ biological gene transfer system for eukaryotic cells – available for non-exclusive use by any entity that agrees and conforms to the terms of a BiOS licence. The licence allows both research and commercial use but encourages sharing of improvements among all licensees.

Rather than frustrating the open science movement, intellectual property rights may, therefore, be seen as a key instrument for promoting and propagating the FAIR principles of data management and stewardship.

This article was first published in the March 2019 edition of Intellectual Property Magazine.

The Fair Principles of Data Management

Recent news & insights