1. Introduction to research data management
Research data are an essential part of a research, and they can be even the most important output of research projects. Although “data” refers to digital format, the term “research data” also includes physical material (e.g., samples). Research material and research data are often synonyms in the context of research data management.
Research data are the original, digital or non-digital sources or material that are created or collated to conduct a research project. The answer to the research question is based on the analysis of this data.
The types of research data can be grouped according to whether they are
- collected data (e.g., samples, questionnaire surveys, measurements, imaging)
- produced data (e.g., material produced from data analysis, software) or
- reused data (e.g., text corpora, picture banks, bio banks).
Research data management endorses transparency and replicability of research and research results. Research data management enables verification of research data for example during peer-review process. Also, according to UEF Open science and research policy research data should be managed and also principally be opened whenever possible and in accordance with the research ethics and the legal framework, including copyright legislation and agreements.
Research data management is part of good scientific practice. The Finnish Advisory Board on Research Integrity, TENK, endorses good scientific practice in Finland. According to it, research integrity requires that the researcher complies with the standards set for scientific knowledge in recording the data obtained during the research. The Finnish Code of Conduct for Research Integrity and Procedures for Handling Alleged Violations of Research Integrity in Finland was published in spring 2023.
For research ethics and legal framework, see further chapter 2 of this learning material.
Another essential part of research data management is the data description. The description should include all information that is required to understand and re-use the data, regardless of whether the data is made publicly open or not. In practice, this includes explanation of, for example, what kind of material the data includes, where it has been collected, who has collected it, and where it is stored. In connection with data description, terms such as metadata and documentation are used. These are discussed in more detail in chapter 3 of this learning material.
Management of research data can be planned in practice by creating a Data Management Plan (DMP). Data management plan is an updated document, that can be compiled using pre-existing templates and webtools such as DMPTuuli. Information on research data management is compiled and updated in the UEF Data Support website. The writing of the DMP is discussed in chapter 6 of this learning material.
The aim of research data management is to make the research process and use of research data responsible and efficient, while simultaneously fulfilling the expectations and requirements of the university, funding agencies and jurisdiction (legislation and contracts).
Research data lifecycle describes the measures related to the data at different stages of the research process. Often this lifecycle is presented as a circle, which also emphasizes the continuity of the data: the data does not, as it were, come to a stop at a clear end point, but, for example, when reused, its lifecycle continues in some way.
The lifecycle often mentions planning, collecting, analysing, storing, publishing, disposing, discovering and reusing, as shown in the figure below. It is good to remember that during research, data management measures do not always proceed according to the model from one stage to the next. For example, collecting or publishing data can take place at many different stages of research. Storing, sharing, and publishing are discussed in Chapter 4 and Chapter 5.
In the beginning of the research process, you must create a research plan and a data management plan related to it. You must carefully check the laws and ethical aspects, take care of data protection when handling personal data, inform research participants, and when necessary, ask for consents or research permits. Plan the informing of the research participants in a way that the management and sharing of data are possible in a way you prefer. For more information on the processing of personal data, see Chapter 2.
Agree about the details of data management with research partners and update your data management plan always when necessary.
Data should be documented, described, and stored so that they remain usable and protected during the research and after it. After data collecting, you analyze your data and report your findings e.g., in a journal article.
Finally, if possible, you can share your data for others to be reused or archive them according to your research organization’s or collaborator’s guidelines. Data lifespan continues and if openly available, data can be reused for new purposes, such as for research, teaching, studying or for commercial use. Storage and opening are handled in Chapter 4 and Chapter 5.
At all stages of research, systematic research data management creates the necessary basis for good research and facilitates the concrete implementation of the research project.
To consider
- What kind of research data do you produce or use?
- What kind of challenges relate to the management of your research data?
Adhering to the FAIR-principles is an integral part of research data management. FAIR principles were published in 2016 (The Future of Research Communications and e-Scholarship, i.e., Force). They guide how to make data and its metadata (the descriptive information of the data)
- Findable
- Accessible
- Interoperable
- Reusable.
Implementing FAIR principles in practice means the following:
- Findable: The essential information (metadata) about the data is described in sufficient detail. The data has a descriptive page and open data has a unique persistent identifier (such as DOI or URN). If the research data is deposited to a repository or archive for reuse this descriptive information is deposited with the data, and the repository or archive gives the permanent identifier for the data. If the data cannot be opened for reuse, the descriptive information, i.e., the metadata, can usually be openly published.
- Accessible: The data and/or its metadata can be searched on the Internet. The data is opened for reuse if possible, or the access is restricted and a justification is provided of why it is not possible to open the data. The data is versioned if necessary, and the data is documented throughout its lifecycle. If the data has been deleted, the metadata is still accessible. Datarepositories and data archives provide guidance in case the data deposited in them has to be removed or replaced with another version. A permanent identifier (e.g.. DOI) should guide to a so called tombstone site that notifies about the data being unavailable.
- Interoperable: The data and/or metadata use general, documented and as open as possible file formats, vocabularies, and shared codebooks. Data or metadata is linked to other outputs if necessary.
- Reusable: The research data is well documented and understandable. The use rights or licenses are clearly presented with the data.
You are probably partially following the FAIR-principles without paying attention to it. For example, you may already use open-source software (interoperability) and document your data carefully during research process (reusability).
Adhering strictly to the FAIR principles is not always feasible. It is good to think what kinds of restrictions and possibilities relate to your research data. Also, following FAIR principles does not necessarily mean sharing openly the data.
Several data management services, such as CSC’s fairdata.fi, have been developed to help you to implement FAIR-principles in your work In addition to choosing a service that has implemented FAIR principles, you should consider what other actions you as a researcher must take to make your data as FAIR as possible.
Watch the video
The FAIR principles support good data management, CSC (1:44)
To consider
- How can you follow FAIR principles with your data management?
- How could another researcher, or someone else interested in the topic, find your research data or information about it?
- How could your data or its metadata be accessed?
The Fairdata services are provided by the Ministry of Education and Culture and produced by CSC – IT Center for Science. The services are offered free of charge to Finnish universities, and state research institutes.
IDA: research data storage
Qvain: research dataset description tool
Etsin: research data finder
Digital Preservation Service for Research Data (in Finnish Fairdata PAS)
Registration as a CSC customer account is required to use the Fairdata services, which can be done, for example, with the HAKA credentials of the home university.
Research data storage service IDA enables storing and sharing both new and published research data. However, it is not intended for a long-term preservation.
Datasets in IDA can be described and published through the Qvain. Qvain is a tool that provides workflows to create standardized metadata for research datasets. Users can describe datasets by the Qvain whose files are stored IDA-service but also elsewhere, outside of the Fairdata services.
The published dataset descriptions by the Qvain are available through the Etsin service. Etsin enables us to find metadata also from other repositories and services.
Digital Preservation Service for Research Data (Fairdata PAS) provides long-term preservation services for valuable research data. Its aim is to ensure maximum reusability for very significant research data and its utilization for future generations. The organization assesses the suitability of the research data for archiving in Fairdata PAS.
UEF instructions and information about research support in Heimo (requires UEF login)
Data management guidelines (Finnish Social Science Data Archive)
Data management checklist (Fairdata.fi)
CSC: Data management and storage. CSC’s data management and storage web page introduces data services and offers useful information about data management.
Responsible Research. Coordination for Open Science, Publication Forum, The Committee for Public Information (TJNK), Finnish National Board on Research Integrity TENK and the Responsible Research Articles.
In a nutshell
research data management concerns how you
- create and re-use data and plan for its use
- organise, structure and name data
- keep it – make it secure, provide access, store and back it up
- share with collaborators, publish your data in a data journal or repository and get cited.
In the following sections of this study material, different parts of research data management are discussed in more detail.
(2023-08)