Writing a preregistration for a study using longitudinal cohort data
Open Research objectives
Making scientific research more reproducible by increasing the amount and quality of information placed on the public record.
Practices
Creating a public pre-registration of a study design
Introduction
The TRacking Adolescent’s Individual Lives Survey (TRAILS) study is an ongoing population-based cohort study that maps the psychological, social and physical development of its participants. TRAILS has been operating since 2001 and recruited participants at age 10 to 12 years old. Assessment waves have been taken place every two to three years, resulting in a wealth of data.
I am currently working on a study in which I use data from TRAILS. Together with my supervisors I decided to write a preregistration for this study. A preregistration is a plan for your study that describes the hypotheses, study design, and plan of analysis. A study can be preregistered at different times in the process (before data collection or before analyzing) but in any time, before writing up the final report. A preregistration is time-stamped and openly accessible online.
Motivation
There are a number of reasons for preregistering a study. The goal of preregistration is transparency. By specifying all the data collection and analysis steps beforehand, the quality of research can be improved, because findings will become more reproducible, replicable, and more likely to be true. It reduces the risk of (unintentional) bias and questionable research practices such as p-hacking (the misuse of data analysis to present statistically significant results, when there is in fact no real effect). Researchers are forced to publish all research questions and hypotheses specified in the preregistration, regardless of the significance of the findings.
Lessons learned
As the data I am working with were already collected, I wrote the preregistration before analyzing. I already wrote an introduction and methods section before starting the preregistration, and I thought I already had a sound and complete analysis plan. While working on the preregistration, I realized that there were quite some aspects that still needed to be sorted out. How exactly would I aggregate different items of scales into one measure? How would I handle missing data? Would I transform scores for analysis and how exactly? Some of these questions would be easier to answer after looking at the data, but the intention was obviously to not look at the data before preregistration took place. We tackled this problem by setting up certain ‘if-then’ rules in the preregistration. We did this for example for psychotropic medication use, which is one of the covariates in the study. We wrote that if the data show that psychotropic medication is rare in our sample, then we will analyze all different types of medication taken together. Otherwise, we will analyze the different types separately.
Another issue I encountered was that in the preregistration I had to explain some decisions that were made in the data collection, while in fact the data collection took place years before. For instance, I had to write a section about the sample size rationale. This was difficult to explain, because I was simply working with the original sample size of the cohort. It may take some sleuthing in older studies to properly explain the choices made in the past. This, however, also proved to be useful, as I learned more about the data I am working with.
All in all, my experience with writing my first preregistration was a rather positive one. Preregistering forced me to think every step in the study process through. Although it seems a time-consuming process, this ‘study plan’ needs to be worked out in any case. Without preregistration you might do this in a later phase. So, by preregistering you do not really lose time, and you win in terms of quality of your research and contributing to open science. That is a win-win, right?
URLs, references and further information
Last modified: | 16 March 2022 11.23 a.m. |