Research data management SBA research support
In the staggering amount of data that is generated every second by men and machines, research data acquire a special value, linked to the possibility of extracting information and its reusability.
Research data is information in any format (digital and/or paper, numerical, descriptive, audio or video), collected and used during a research activity, necessary to validate the results achieved. Examples include (but are not limited to): numbers, files, results of experiments (positive or negative), observations, published and unpublished sources, bibliographic references, software and codes, texts, videos, sounds, interviews.
Depending on the degree of processing, there are 4 types of data:
- raw or primary data: notes, images, videos, surveys, interviews, computer files
- processed data: reports, documents, tables
- shared data
- published data
Data management has become an indispensable activity for every researcher.
Research Data Management (RDM)
Research Data Management means organising the work of collecting and storing data to ensure that its properly preserved, traceable and understandable even after a period of time or by those who did not participate in the research. In this way, knowledge can circulate and foster innovation.
Research Data Management is an operational activity that must be supported by governance (local and national) with the adoption of policies that define the roles and activities to be performed by the institution and the researcher according to the European Commission guidelines.
Increasingly, research funding programmes are calling for research data to be made available to enable validation of scientific publications.
The European Commission also encourages making research data open and available according to the principle “as open as possible, as closed as necessary” (Data management)
Although openness is encouraged, it is necessary to keep data closed (even temporarily) in some cases:
- data protection for security reasons
- protection of privacy (sensitive data)
- possible industrial or commercial exploitation (patents)
- other legitimate reasons to be justified.
The data is made open via:
- archiving in open and trusted repositories
- archiving of the documentation needed to understand the tools and software used to generate and process the data (read-me files), so it can be understood over time and decoded
- cross-linking the data to the relevant scientific publications (by including the PID of the dataset in the publication metadata).
Open access to scientific research data:
- promotes the advancement of knowledge
- increases the reproducibility of research
- reduces duplication
- increases transparency.
The data itself is not a work of authorship and is not subject to copyright. If there is no justified reason not to disseminate it, the data should be made public, re-used or re-distributed without restriction under free domain or attribution licences (CC-BY or CC 0, or equivalent).
For more information see the section How do I license my research data? of the OpenAIRE portal.
The removal of legal and technological barriers makes it possible to acquire, store, modify and share a large amount of data with a positive impact on knowledge, economy and society.
To achieve this, research data must be managed according to FAIR (Findable, Accessible, Interoperable, Reusable) principles
To be FAIR the data must be:
- Findable thanks to Digital Object Identifier (DOIs) and metadata built according to international standards (Dublin Core, DCC guide for Metadata standards etc.)
- Accessible: data and metadata must be able to be accessed by humans and machines through storage in archives or repositories and the use of standard protocols. Metadata must at least be available even when data is not open access. Accessible does not in fact mean “open” (authentication and authorisation systems may be in place)
- Interoperable: data should be saved in non-proprietary, uncompressed, unencrypted formats with documented standards, able to be processed by operating systems with FAIR-compliant languages
- Reusable: in order to be reusable, data must be accompanied by a licence to use it (CC-BY or CC0) and documentation with information on its creation.
The compatibility of the data produced by the research with the FAIR principles is ensured by the correct processing of the Data Management Plan (DMP).
Data Management Plan
Research projects funded by organisations (public and private) that produce data (open or closed) require the drafting of a Data Management Plan (DMP) – an operational tool describing how the data will be managed, enhanced and preserved over time during and after the research, how it will be reused and disseminated, and any ethical implications of the project.
The management plan for data and funds received
- is required by funding bodies, including the European Commission (e.g., Horizon Europe programme), which requires it to be delivered within the 6th month of funding
- must be drafted at the research design stage
- is a living document that must be amended or supplemented whenever there are changes in the nature of the data or in the way it is collected and managed
- must be shared with all researchers involved in research
- must be concise and precise.
The DMP is therefore a tool to plan and communicate, from the beginning of the activity, the collection, storage, re-use and dissemination of data, together with the associated metadata. The richer the metadata, the greater the discoverability of the data.
The DMP is drafted by the main investigator in the form of templates (such as those proposed by the online tools DCC, Data Stewardship Wizard, easyDMP, OpenAiRE's Argos) and represents the entire lifecycle of the data, ensuring its traceability, availability, authenticity, citability, appropriate preservation, adherence to clear legal parameters and the adoption of adequate security measures, which ensure and regulate its subsequent uses.
To collect the data, it is advisable to use only university-approved tools (such as LimeSurvey or Google Forms) and not platform such as Qualtrics.xm which is not fully compliant with the GDPR as the personal data collected is transferred to the United States.
DMP and ethical implications
The ethics committee should be consulted when the research project involves activities that involve the collection of personal data, in terms of quantity (number of personal information collected) and quality (personal data that may reveal an individual's racial or ethnic origin, sexual orientation, political opinions, religious or philosophical beliefs, or trade union membership, or genetic and biometric data or health data).
The ethics committee's opinion is for the protection of researchers and research participants.
For more information, please consult Ca' Foscari Ethics Committee page, Data Management Plan (DMP) section.
Data and metadata
Data acquires additional value and meaning when associated with metadata.
Indeed, the correlation of data and metadata creates unexpected links and opportunities (internet of things).
However, in order to be “machine readable”, metadata must follow standardised schemes and predefined syntaxes (Dublin Core...).
The use of standardised, “rich” metadata makes it possible to:
- track the publication and dataset thanks to persistent identifiers (DOIs, handle, ISSN, ISBN, ORCID)
- better describe the data and facilitate its discovery: metadata contains information on title, creator, abstract, keywords
- certify the integrity, provenance, preservation of data: metadata provides information on publisher, funder, format, file size, preservation platform, storage mode
- clarify rights: metadata provides information on the licences with which the data are associated and the conditions for re-use.
Last update: 16/11/2023