Skip to ContentSkip to Navigation
Digital Competence CentrePart of University of Groningen
Digital Competence Centre
your one-stop for research IT and data
Digital Competence Centre Privacy & Protection Data Protection

Data minimization & de-identification

Data minimization life cycle

As a researcher, you are responsible for protecting the privacy of your participants. For this reason, the principle of data minimization (GDPR art. 5 (1c) ) should be one of the leading principles in the design of your research project. This means that you only collect personal data that is necessary for your research purposes, and de-identify your dataset once personal data is no longer needed to prevent re-identification of your data subjects. 

Data minimization and de-identification reduce privacy risks by avoiding unnecessary personal data collection and storage. This does not mean you cannot collect and store any (sensitive) personal data. It means the data you will collect must be justified by your research question.

Essential concepts

Personal data (direct and indirect identifiers)

Personal data

“Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which, collected together, can lead to the identification of a particular person, also constitute personal data.” (European Commission). 

Direct identifiers

Direct identifiers are data that make it easy to identify an individual, such as name, e-mail address, phone number, home address or IP address.

Indirect identifiers

Indirect identifiers (or: quasi identifiers) are data that do not directly identify an individual, but could, in combination with other identifiers, be unique to an individual and can therefore lead to identification. For example: Women from Groningen who drive a McLaren car. Combined, the underlined identifiers could possibly single out an individual and are, therefore, examples of indirect identifiers.

Examples of indirect identifiers are: 

  • Demographics (date of birth, gender, job occupation, etc.)
  • Social media photos
  • Location
  • Any other background information about a specific person.

Data minimization

Data minimization

Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed” (GDPR art. 5 (1c)); Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data.

De-identification (pseudonymization, anonymization)

De-identification

De-identification is the masking, manipulation or removal of personal data with the aim to make individuals in a dataset less easy to identify.

Pseudonymisation

Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by an unique alias or code (pseudonym). In general, the names and/or contact details of data subjects are stored with this pseudonym in a so-called keyfile. The keyfile enables the re-identification of individuals in the dataset. Keyfiles are stored separately from the rest of the data and access should be restricted. In contrast to an anonymized dataset, a pseudonymized dataset in principle still allows for the re-identification of data subjects.

Anonymization

Anonymization is a de-identification procedure during which “personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." (ISO 25237:2017 Health informatics -- Pseudonymization. ISO. 2017. p. 7.). In contrast to a pseudonymized dataset, an anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data.  


Practical guides

How you apply data minimization and de-identification depends on your research methods and study design. In practice, this can be challenging because it is not always clear what data is strictly necessary.

To support researchers, the DCC developed several practical guides on how to apply data minimization, pseudonymization and de-identification techniques in different research contexts: 

decorative image
Last modified:15 April 2026 11.07 a.m.