Good practices for FAIR data management (1) - an interview with Clara Egger on the EXCEPTIUS dataset
Date: | 19 September 2022 |
Author: | Leon ter Schure |
Part of open science is that researchers make their data FAIR: Findable, Accessible, Interoperable and Reusable. But how to do this? In this series, we ask UG researchers to tell us more about their data management choices.
In the first edition of this series we highlight the EXCEPTIUS dataset, which was produced as part of an international research project called EXCEPTIUS. This analyses exceptional decision-making in 32 European countries during the COVID-pandemic, in order to learn more about the resilience of democratic systems in times of crisis.
We asked project lead Dr Clara Egger of the Department of Public Administration and Sociology of the Erasmus University Rotterdam (previously the UG Centre for International Relations Research) a few questions:
Can you describe your dataset?
The EXCEPTIUS dataset identifies and classifies COVID-19 containment measures taken daily and at the subnational levels of government (when relevant) in 24 countries of the European Economic Area from 30 January 2020 until 30 April 2021. We predominantly focus on measures related to democratic governance, human rights and daily liberties, international cooperation, and public administration. For each of the measures, the dataset identifies the authorities who adopted it, the geographical coverage of the measure, the groups targeted and the sanction associated with non-compliance. Upcoming versions of the dataset will include a stringency index for each of the nine types of measures covered.
The EXCEPTIUS dataset adds value to existing data initiatives on COVID-19 policies as it:
-
Focuses on subnational levels of governments
-
Includes data on implementation modalities
-
Draws upon legal sources to go beyond the semantic proximity between the measures (all but a few European countries declared adopting a “lockdown” or relying on “emergency powers”) and identify, in a fine-grained manner, variation in the use of COVID-19 containment measures.
The collection of the data followed a three-staged process. First, we compiled (and publicly released) a multilingual corpus of legal sources. Second, a legal AI algorithm was developed by Dr Tommaso Caselli and Georgios Tziafas to automatically detect Covid-19 containment measures in legal texts. Third, this pre-coding was used by a transnational team of human coders who manually collected information on the policies implemented. EXCEPTIUS - involving 17 European partners coordinated by the University of Groningen and the University of Grenoble - benefitted from the support of ZonMw (grant number 10430032010026) and the University Grenoble Alpes/ Sciences Po / CNRS.
You published the dataset on DataverseNL . Why did you choose this platform?
We discussed different possibilities within the project’s Board and opted for Dataverse, as it is a well-known platform that allows our data to be easily found and made accessible for a large scholarly community. In addition, the platform offers many possibilities to present, in an extensive manner, metadata. It also provides the data with a DOI that can be immediately used to promote it on social media and on the project’s website, without having to wait for the publication of a research article commenting on the dataset. Lastly, the platform is quite intuitive and easy to use and the data stewards from the UG Digital Competence Centre are always available to support researchers in the use of the platform. Moving forward, we are also thinking of publishing our dataset in the European Open Science Cloud to further increase the findability and accessibility of the dataset.
Your dataset was published under a CC0 license, meaning that it can be reused by others without any restrictions. Why do you think it is important to share your research data with the world?
As a researcher working for a public institution, I believe that we have an ethical duty to make our research transparent and accessible to the society that finances it. This is even more important as EXCEPTIUS is tackling a central societal issue, as it aims to map and analyze the impacts of the COVID-19 pandemic on democratic governance and fundamental rights. Making data largely accessible and reusable also allows to increase the reliability of research results. Science progresses out of a trial and error process where replicability plays a key role. The fact that EXCEPTIUS data is openly accessible without restriction has already led to some improvements and corrections of the master dataset (we are now developing the 3.0 version) due the constructive feedback we received from users all over the world.
What challenges (if any) did you face in making your dataset FAIR (Findable, Accessible, Interoperable and Reusable)? Do you have any tips or tricks for fellow researchers and what support did you receive (if any)?
Overall, I must say that making EXCEPTIUS data FAIR was an easy process mainly due to the nature of the data we collected (public legal decisions). For example, I face more challenges when working on other research projects involving the collection and analysis of personal and sensitive data on political attitudes.
The key challenge for us was mostly the development of an interoperable metadata scheme. Compared to other fields, the development of common standards is still in its infancy in social sciences and most of the researchers use ad hoc schemes developed in the context of a specific project. We discussed this issue at great length in the meetings we had with other ZonMw-funded social sciences projects but did not reach a consensus. The process was quite time consuming and frustrating at times.
I would share three key tips with fellow researchers: The first is to work with colleagues that are more experienced in FAIR data management. I had the chance to work with Dr Tommaso Caselli whose expertise helped in preparing the data in a FAIR way. The second is to work on the data management plan in the early stages of a research project. I benefited from the expertise of Maarten Goldberg - from the UG DCC and now working at the Faculty of Law - whose availability and enthusiasm have been instrumental to the success of the project. DMPs are often considered a hassle by researchers, but in reality they can be adapted to many different research needs and projects. Using a platform like DMPonline made the whole DMP drafting process easier. Lastly, open science is a fast changing field and following its developments can be daunting. Yet, it also involves a large and active community of researchers, always ready to share experiences and tips. Social media accounts such as, for example, @openscience provide a mine of information on old and new open science initiatives.
Researcher institutions from 16 different countries are involved in the EXCEPTIUS project. Did everyone agree to making the data openly available? Was it easy or difficult to reach consensus about this?
All the members of EXCEPTIUS consortium share a commitment to open science and to do everything possible to ensure that our research contributes to better preparing European democracies to future crises. FAIR and open science principles were, since the start of the project, considered as the default option.
Interesting Links:
Website of the EXCEPTIUS-project
More about the UG Open Science Programme
The UG Digital Competence Centre (UG DCC) provides data management and IT support and manages data repository DataverseNL
For open science on Twitter, check out @openscience
About the author
Leon ter Schure is Lead of the UG Digital Competence Centre (DCC).