Introduction

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • 1 Does FAIR data means open data?

  • 2 What are Digital Objects and Persistent Identifiers?

  • 3 Different types of PIDs

Objectives
  • The participant will understand that the FAIR principles are fundamental for Sustainable Science.

  • The participant will learn what human and machine-friendly digital objects are.

1. Does FAIR data means open data?

No.
FAIR means human and machine-friendly data sources which aim for transparency in science and future reuse

FAIR and Open Science

What does it mean to be machine-readable vs human-readable?

Human Readable “Data in a format that can be conveniently read by a human. Some human-readable formats, such as PDF, are not machine-readable as they are not structured data, i.e. the representation of the data on disk does not represent the actual relationships present in the data.”
Machine Readable “Data in a data format that can be automatically read and processed by a computer, such as CSV, JSON, XML, etc. Machine-readable data must be structured data. Compare human-readable. Non-digital material (for example, printed or hand-written documents) is not machine-readable by its non-digital nature. But even digital material need not be machine-readable. For example, consider a PDF document containing tables of data. These are definitely digital but are not machine-readable because a computer would struggle to access the tabular information - even though they are very human readable. The equivalent tables in a format such as a spreadsheet would be machine-readable. As another example, scans (photographs) of text are not machine-readable (but are human readable!) but the equivalent text in a format such as a simple ASCII text file can machine readable and processable.”


Machine Friendly DO

Machine friendly = Machine-readable + Machine-actionable + Machine-interoperable

During this sourcebook, we will be using “Machine-readable” and “Machine friendly” interchangeably. We like the term “friendly” since it can also include “machine-actionability” and “machine-interoperability.”

Exercise - Level Medium 🌶🌶

Understanding the jargon of data “things” Data collection varies depending the field of research, therefore the digital object that is discribed.

  • Task: Perform with your group a search of the following data “things” and discuss the meaning, similarity and differences of each one of them.
  • Dataset
  • Database
  • Data collection
  • Data type
  • Data model

2. What are Digital Objects and Persistent Identifiers?

A Digital Object is a bit sequence located in a digital memory or storage that has, on its own, informational value. For example:


Digital Object Anatomy

To learn more about FAIR Digital Objects

FAIR Digital Objects: Which Services Are Required? (Schwardman, Ulrich 2020)
FAIR Digital Object Framework Documentation (BdSS, Luiz Olavo 2020)


A Persistent Identifier (PID) is a long-lasting reference to a (digital or physical) resource:


Video: The FREYA project explains the significance of PID: LINK TO SOURCE

Different types of PIDs

PIDs have community support, organisational commitment and technical infrastructure to ensure the persistence of identifiers. They often are created to respond to a community’s needs. For instance, the International Standard Book Number or ISBN was created to assign unique numbers to books, is used by book publishers, and is managed by the International ISBN Agency. Another type of PID, the Open Researcher and Contributor ID or ORCID (iD), was created to help with author disambiguation by providing unique identifiers for authors. The ODIN Project identifies additional PIDs along with Wikipedia’s page on PIDs.

In Episode 6 (Data Archiving), you will explore one type of PID, the DOI (Digital Object Identifier), which is usually the standard PID for Datasets and Publications

Exercise - Level Easy 🌶

  1. arXiv is a preprint repository for physics, math, computer science and related disciplines.
  2. It allows researchers to share and access their work before it is formally published.
  3. Visit the arXiv new papers page for Machine Learning.
  4. Choose any paper by clicking on the ‘pdf’ link. Now use control + F or command + F and search for ‘HTTP’. Did the author use DOIs for their data?

Solution

Authors will often link to platforms such as GitHub where they have shared their software, and/or they will link to their website hosting the data used in the paper. The danger is that platforms like GitHub and personal websites are not permanent. Instead, authors can use repositories to deposit and preserve their data and software while minting a DOI. Links to software sharing platforms or personal websites might move, but DOIs will always resolve to information about the software and/or data. See DataCite’s Best Practices for a Tombstone Page.

DOIs are everywhere, examples:

Key Points

  • FAIR means human and machine friendly data sources which aim for transparency in science and future reuse.

  • DOI (Digital Object Identifier) is a type of PID (Persistent Identifier)