Openly sharing data: Are human data more traceable than we think?
Open Research objective
Making data, analysis scripts, and experiment programs freely accessible
Introduction
As a follow-up to nine replication studies I had carried out at the start of my PhD, I set out to conduct a series of experiments (those that yielded reproducible results) to examine differences between the left and right brain hemispheres during visual perception. I tested a large group of people, among which a group of rare individuals with right hemisphere dominance for language (often left-handed people with ‘atypical’ measures on hemispheric asymmetry scores). I used five tasks for this which I had previously programmed in E-Prime, and wrote my analysis scripts using R. Upon publication of this research in a peer-reviewed journal, I published the data, analysis scripts and experiment programs on Open Science Framework, freely accessible.
Motivation
For the purpose of the replication studies I had contacted a large number of researchers to ask them if they were willing to share their original experiment files with me: I was going for as pure as possible replications. They were often willing, but unable. The experiments had been done a while ago, experiments had been lost or become inaccessibly stored on old laptops. This inspired my intention to do it differently: make all my experiment files freely accessible before the whole endeavor has sunken to the background too much to retrieve the right files upon request.
With regards to analysis scripts and data sharing I felt this could contribute to our joined aim of unraveling the workings of the two hemispheres (this is not an individual’s job!). I had been discussing pooling data with other researchers before, and being able to just share my data online to be used by anyone interested in similar questions as mine sparked enthusiasm.
Even though the above intrinsic motivations had strongly manifested in my mind, I was also demotivated by all the extra work it would take me. Freely sharing analysis and experiment scripts means annotating them in such a way that they can be understood by other people than myself: something I hadn’t been keeping in mind during the process of writing them. The final push to go ahead with making these data and materials freely accessible was an external one: the journal that I wanted to publish in demanded it, and I applaud them for it.
Lessons learned
It seemed so simple in my mind –delete the column of clearly identifiable information like IP-addresses and time stamps from the data file and upload it to Open Science Framework. However, to be quite certain I checked with our faculty’s data expert and he was a little more cautious. Even without clearly identifiable information, my data turned out to be more ‘traceable’ than I had thought. In my case, combinations of sex, handedness, age and eye dominance could lead to individual participants (especially those more unique ones) who could possibly be identified by people with access to university data systems in Groningen. The advice was to refrain from publishing on the Open Science Framework, but instead to deposit the data on a university server, to be accessed on request. In the end, I opted to delete the column with age and eye dominance, leaving only handedness and sex as demographic information –too little to trace individuals.
Having the support of the Research Data Management of the university, of data experts thinking along, and the ‘push’ of a journal that is eager to enter the next era of Open Science really helped me in getting it done in the end. This, too, is a joined effort, and viewing it as such will help us along in realizing our Open Science goals and improving our scientific practice.
URLs, references and further information
Last modified: | 16 March 2022 11.23 a.m. |