Thomas Vandenberghe, Susana D Tagarro, Dick Schaap, Guillaume Clodic, Juan L Ruiz, Hong M Le, Yvan Stojanov, Christian Autermann, and Simon Jirka (ed.) (2021)
Data management in Eurofleets+: the whole picture
Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, vol. 62(supplement 1), International Conference on Marine Data and Information Systems Bollettino di Geofisica teorica ed applicata.
Eurofleets+ is a consortium of 42 research vessel operators aiming to provide access to ship-time for high-quality marine campaigns, including equipment and remote sampling access. From the start, the project has given data management a central place. This approach acknowledges the important drivers of efficient data management: a) broad acquisition by means of a data management plan, b) adequate transformation by software agents and c) integrating the exchange technology used by data repositories such as SeaDatacloud, all three designed to work together.
Eurofleets+ (EF+) is a 4-years H2020-funded project, and is currently in its second year. At this moment, no cruises have yet departed. For the cruise and dataset metadata funded by Eurofleets 2 (2013-2017), it has not always been apparent what their funding context was, let alone that a centralized view on the generated datasets was possible. For the Eurofleets+ proposal, the gaps in achieving this have been filled. For a better synergy with other aspects of the project, they have been separated into multiple work packages. Compared to Eurofleets 2, included in the description of work are a) the procurement of a data management plan (DMP) as a mandatory evaluation criterion, to assure data provision, and b) the assignment of dedicated data management organisations to assist principal investigators and vessel operators, to ensure data dissemination of EF+ cruises.
An additional reason to enforce DMPs is that it is a requirement of any H2020 project. Therefore, as a deliverable, the project has developed a DMP, but more relevant in this context, the cruises as well; patternwise, the project DMP is composed of all cruise DMPs, that derive from a template. The DMP template takes the form of a forked DMP Road map web application (created by the UK Digital Curation Centre and the University of California Curation Center) and contains a number of questions adapted for EF+ from the H2020 Open Research Data Pilot.
The DMP website (http://dmp.ef-ears.eu) also provides the data management guidelines. These guidelines state the data workflow, from acquisition to dissemination. A distinction is made between en-route data and manual data. 'Manual' data (sample-derived) will be posted by the Principal Investigator on the EMODnet Data Ingestion Platform and data managed by three reference data centres, i.e. HCMR, OGS and BMDC. These will take care of the actual dissemination and promotion of both en-route and manual data by publishing the corresponding metadata in global directories (SeaDataNet and thence to EurOBIS, EMODnet, GEOSS, IOC-IODE portal) but also on a dedicated EF+ dataset catalogue, providing persistent links (DOIs) to the actual data, accessible through the project website and the “European Virtual Infrastructure in Ocean Research” portal (EVIOR). Specific attention is paid to 1) meteorological data, 2) “Essential Ocean Variables” (e.g. sea temperature, salinity, currents, oxygen, nutrients, carbon, plankton biomass,...) 3) 3.5 kHz or Chirp light seismic; and 4) multi-beam bathymetry, as these are underrepresented and have a high potential.
The main software agent is the Eurofleets Automated Reporting System (EARS) which provides software and services for en-route data acquisition, recording cruise and event metadata, and transforming it into the necessary European and global marine data standards. In 2020, an optimized EARS "v2.5" will be distributed to vessel operators for use during cruises. Version v3.0 is under development and will be released in the beginning of 2021.
The EARS server distribution is based on docker and available on GitHub together with guidelines on installation. It is to be installed on the vessel, relies on TechSAS for data acquisition, and stores the data in the local EARS database. The end goal is to let each RV operator have a 52°North Sensor Observation Service installed on-shore as a central interoperability hub for acquisition data and event metadata. This means the SOS will also contain infomation on non-sensor devices such as sampling bottles. For the R/V operator's convenience, this SOS will be packaged as a docker image together with an on-shore EARS server; but non-virtualised solutions should remain possible. The EVIOR data portal uses the SOS GetObservation verbs to display the cruise tracks and primary en-route (navigation, meteorology and thermosalinometry) data in near-real time or at least the speed data is made available. The EVIOR portal and the operator's SOS will be the main interface for the data managers of the three reference data centres to retrieve the en-route data. Currently, the work package partners are gaining experience with a central SOS set up at CSIC.
Creating the events on-board improves the completeness of the metadata as it is recorded accurately at the origin of the measurement. The EARS manual event data structures follow the event semantic model developed in Eurofleets2 and are exposed via a RESTful web service. To optimally contain all elements of a Cruise Summary Report, including references to P02, C77, the gml track and a summary of measurements, the data structures have been redesigned. In order to be able to express the device events in SensorML, new W06 BODC vocabulary entries are being requested.
Document Actions