Making research data 'as open as possible and as closed as necessary' - an interview with Marlon de Jong, Privacy and Data Protection Specialist (UG DCC)

Open science stimulates you to share your research data openly with the world, yet sometimes access to data needs to be restricted to protect the privacy of participants. Marlon de Jong is a consultant Privacy and Data Protection at the Digital Competence Centre ( UG DCC ). She advises UG researchers on how to make their data ‘as open as possible and as closed as necessary’.
“I notice that researchers often struggle with the question of whether their research data may actually be shared, and in what way.”
What is your role as privacy and data protection specialist?
I offer consultancy to all UG researchers who work with personal data, principally from UG faculties such as Behavioral and Social Sciences, Arts, Theology and Religious Studies, Spatial Sciences and Economics and Business. My main focus is on research projects with a high privacy risk. In consortia-projects (involving multiple research institutions) or collaborations with external parties (such as companies, governmental institutions or healthcare organizations), there is often a shared responsibility for the collected data and a need to clarify the different roles and responsibilities.
As a consultant, I can think along with researchers about suitable organizational and technical measures to deal with their data. The objective is to reduce risks concerning the rights and freedoms of participants. Researchers who conduct privacy-sensitive research have to deal with many requirements, such as open science, the GDPR, institutional data policies and protocols, and the ethics committee's approval. They do not always know where to find the help they need. I can sit down with them to clarify their question and if necessary, bring them into contact with the right support staff or department within the university. It certainly helps that I previously worked as a researcher and data manager myself on various research projects at the Faculty of Behavioral and Social Sciences.
What kind of questions do you get from researchers?
Most questions are from researchers who recently obtained a grant and are now working on a Research Data Management Plan (RDMP). The RDMP is required by UG research institutes and funders such as NWO and ERC. Most importantly, the RDMP is a great way to reflect on how you will manage your data. In case of bigger grants and multi-party projects this can be a challenge, as each institution has its own data management protocols and legal requirements, especially if research projects cross country or even EU borders. A good way to start is to map out the ‘data flow’ for the entire research data life cycle - from collection and processing, to archiving and opening up the data for reuse by others.
I notice that researchers often struggle with the question of whether their research data may actually be shared, and in what way. Although funding organizations and universities ask researchers to share their data, this cannot be done easily with sensitive personal data. Data may not be publically shared as long as they can be traced back to the participants in the project.
What is your advice on how to deal with data that cannot easily be made publically available, such as personal data?
Before you start your project, think about the kind of data that you need for your research. Can you limit the collection of personal information (‘data minimization’)? Is it possible to ‘de-identify’ the dataset by removing identifying characteristics, without reducing the relevance of the data for research? There are several de-identification guides out there (example) and one of the things we are working on is the best solutions for the UG context. It is always possible to contact the UG DCC for advice.
If data cannot be properly de-identified, you can work with restricted access. In these cases you can share only the metadata (the description of your dataset) and/or define the conditions under which data may be used by fellow researchers. There are many data repositories that facilitate this. I recommend using DataverseNL, which is a trustworthy and high-quality data repository that is managed by the UG DCC and welcomes contributions from all disciplines.
Open science is one of the priorities of the UG for the coming years. What are the implications for data management? What are the main challenges?
With regard to research data management (RDM) this means that researchers are asked to make their research data and software FAIR: Findable, Accessible, Interoperable and Reusable. The FAIR principles have become the norm for practicing open science and responsible data management.
FAIR, however, does not necessarily mean that your data is openly available. There can be reasons to restrict access, for instance, when this would affect the participants' privacy, intellectual property rights or commercial interests. Reasons can also be practical: in some cases a dataset is too big to open up completely. As part of the UG Open Science Programme we are working on guidelines that help researchers determine under which conditions and how restricted data can still be shared.
How can you help researchers who want to practice open science?
I work at the UG DCC and we offer support around all aspects of RDM. This covers the different stages of the research data life cycle - from data management planning, to processing, sharing and storing your data during research, to archiving and publishing the data afterwards. Communication is key to overcome legal and ethical hurdles and find best practices for specific fields. We can help to clarify responsibilities and find the right experts.
We can also help with finding appropriate research IT solutions. Think for instance of ways to encrypt sensitive data, appropriate solutions for storing and sharing data with peers, or creating a protected virtual environment that allows you to collaborate safely on data analysis. We are in close contact with specialists at the CIT and we will always make sure that your support request ends up with the right person.
You are working on the topic of privacy by design. Can you explain what this means?
Privacy compliance is not as difficult as it sometimes seems, because it can be made part of the design of a research project. This is called ‘privacy by design’. We actually have an exciting DCC event coming up about this topic. I recommend this to everyone interested in open science, human subject research, privacy in research and/or research data management - and, of course, if you would like to get to know the DCC and meet like-minded colleagues.
More information:
DCC event 'As closed as necessary...' How to design for privacy in research
Get advice from Marlon during the DCC consultation hour: Privacy in Research
Website of the UG Digital Competence Centre (UG DCC)
Take the award winning online UG course Privacy in Research: asking the right questions
About the author
Leon ter Schure is Lead of the UG Digital Competence Centre (DCC).

