Skip to ContentSkip to Navigation
Digital Competence Centre
your one-stop for research IT and data
Digital Competence Centre Privacy & Data Protection Research Scenarios Secondary Use

Text and Data Mining

Text and Data Mining (TDM) are computational research methods to extract new information and knowledge from extensive amounts of unstructured data such as journal articles, tweets, blog posts, websites, etc.

TDM techniques can be applied to texts in the public domain and texts protected by copyright. However, data may contain personal information.

If your research involves TDM techniques, it is essential to consider two aspects: how to combine TDM with copyright law and how to comply with GDPR.

Copyright law

Articles 3 and 4 of the new EU Copyright Directive provide exceptions to the copyright default rule of asking permission of the rights holders for TDM research. The new directive allows researchers to make copies of data, as long as the access is lawful and the project is carried out for the purpose of scientific research.

The copies should be stored with an appropriate level of security and may be stored after the end of the research project for scientific purposes, including verification. The European legislature also encourages the Member States and institutions to define commonly agreed good practices. In the Netherlands, the directive is implemented in articles 15n and o of the Dutch Copyright Act.

GDPR

Article 85 of the GDPR allows processing personal data for academic expressions. For researchers to apply for the exceptions it is necessary to demonstrate that the data are really necessary for academic expression and the principles of the GDPR are taken into account.

Last modified:02 February 2024 12.54 p.m.