With the digital revolution, the way to approach research has fundamentally changed. Suddenly, research processes created digital research data that needed to be stored. Initially, no standards for this existed, so practices diverged wildly, leading to data that was not findable without a management system. For this reason, movements entered the picture intending to standardize these processes. One recommendation is the FAIR Guiding Principles, which describe that research data should be findable, accessible, interoperable, and reusable. While these principles have set goals, no implementation guideline is provided, leading research data management (RDM) teams around the globe to create numerous implementations. Some of them are platforms like Coscine, which can manage FAIR research data. However, such platforms face the issue that researchers want to store their research data with an openly accessible storage provider. Therefore, research data often does not move through these platforms but directly through the storage providers. This circumstance contradicts the aim of following the FAIR principles, and the platforms miss critical provenance information. The presented thesis aims to close that gap by providing a method to calculate the missing provenance information after changes occur. This so-called asynchronous data provenance is produced by comparing representations of research data. If the representations have changed, a new version or variant of the research data has likely been created. Representations can range from a generated hash to interoperable metadata about the research data. This interoperable metadata is created by running a pipeline that receives research data and extracts valuable information about its content. This information is annotated as interoperable metadata by following existing application profiles and ontologies. Interoperable metadata can be used to compute the similarity of research data with a method called FSS Jaccard. The created methods are integrated into a standards-based RDM system (RDMS), defined in this thesis, to show their applicability. For this standards-based RDMS, Coscine is used as a use case. Thereby, this thesis presents a method that can provide additional information about research data and close the presented gap for any standards-based RDMS. By using this method, RDM teams can come closer to supporting the implementation of the FAIR principles and improving the processes for researchers.
Autor Heinrichs, Benedikt
Gewicht 0.313 kg
Erscheinungsdatum 06.05.2024
Eigene Bewertung schreiben
Sie bewerten:Asynchronous Tracking and Description of Research Data Changes in Distributed Systems with Interoperable Metadata


Heinrichs, Benedikt

Asynchronous Tracking and Description of Research Data Changes in Distributed Systems with Interoperable Metadata

ISBN: 978-3-98555-214-6
Lieferzeit: 2-3 Tage
49,00 €
inkl. 7% MwSt.


With the ever-increasing amount of research data, the question arises where this data comes from and what it is about. The aim of this thesis is to provide an overview of the topic and address issues caused by the rapid explosion of data by examining existing standards and attempting to develop new methods for recreating provenance information from data. These methods are applied to different use cases and the specifics of data similarity and metadata extraction are explored.

Auf Lager