Data management – from a section in the grant proposal to a day-to-day reference manual 

By Krzysztof Cipora, Lecturer in Mathematical Cognition, Open Research Lead of the School of Science, Centre for Mathematical Cognition, Loughborough University, k.cipora@lboro.ac.uk, @krzysztofcipora

Most funding agencies require grant proposals to contain a data management plan. It may seem an extra burden to prepare yet another document, as all applicants have been handling research data and know how to do it, so why mandate such a technicality in the proposal? At the same time, many researchers have not been formally trained in data management. Open Research practices becoming more and more widely adopted (and more and more often mandated by funders) include Open Data, that is sharing research data, either in the public domain or granting access to other researchers. No matter whether shared publicly or with some restrictions, the data need to be understandable and usable. This requires the data to be thoroughly curated, documented, and at best to go along with the programming code used for its processing and analysis. Researchers also benefit from good data curation and documentation if they come back to their own data after a few months or years. Data management is even more critical in large-scale projects including many researchers, research assistants etc. However, the document typically supplied to the funder is relatively short (space restrictions!) and therefore quite generic, so it cannot fully satisfy the day-to-day data management needs. 

In June 2022 at Loughborough University, we launched £ 9 989 000 Centre for Early Mathematics Learning (CEML; ceml.ac.uk) funded within ESRC Research Centre scheme. Funding covers a period of five years and over twenty-five researchers from several institutions are involved within five CEML challenges. They are supported by several research assistants and PhD students. Various types of data are being gathered. One of the crucial issues at the CEML onset was to ensure we are on the same page with data management. It has been necessary for several reasons both within CEML and for future data sharing. The same variables should be named and coded consistently across studies to streamline the readability of the data, facilitate the re-use of analysis code, and collapse datasets if needed. In case a researcher is reallocated from one study to another, they can catch up easily. Keeping our data curated and consistent for ourselves also makes it more accessible to other researchers when we share it with the community. 

Together with other colleagues, I took on preparing a detailed Data Management Policy for the CEML. In the following, I briefly describe what we did and how this might be used as an example for other projects (including much smaller ones). 

We started from the Data Management Policy from the CEML proposal and elaborated on the details (see CEML Data Management Policy https://doi.org/10.17028/rd.lboro.21820752). The document first outlines the responsibilities of Challenge Leads and Leads of specific studies being run (this is particularly important given the number of researchers involved in CEML). It specifies where the data are stored, who should have access to the data (working together with researchers from outside Loughborough University required some thinking of how to set this up efficiently), and when the data can be shared publicly. The document also specifies how to document the data entry process and how to document data analysis to ensure analytical reproducibility. We also specify the process of creating backups and data sharing. 

The Data Management Policy refers to Variable Dictionary (see CEML Variable Dictionary https://doi.org/10.17028/rd.lboro.21820824) – a document providing detailed information on how to name and organise data files and how to name variables. We also provide a template for the meta-data file to be created for each study (see CEML Variable Dictionary Template https://doi.org/10.17028/rd.lboro.21820947).  

To make these materials more accessible to CEML colleagues, we prepared a short video highlighting the most important aspects of the CEML data management and justifying why such detailed guidelines have been prepared (see CEML Data Management Training Video https://doi.org/10.17028/rd.lboro.21820713). 

All these look quite elaborate and may not seem very useful for smaller projects. However, at least some of these points may be worth considering. It is worth remembering that “there will be at least two people working with your data, you and future you”. Thus, ensuring a consistent way of naming data files, their structure, variable names, the analysis code, and documenting the progress of data processing is a big favour to future you and, most likely, other researchers who may work with your data. Hopefully, the CEML documents linked may serve as a useful template on how to prepare a day-to-day data management reference for other projects.

The views and opinions of this article are the author’s and do not reflect those of the University…although hopefully they do reflect Loughborough University values.

A different kind of diversity

By Lara Skelly, Open Research Manager for Data and Methods

A few years ago, I submitted a methodological paper to a discipline-specific journal. The reviewers were not kind, one of them saying “There is no narrative of the findings.” Well naturally not, as the findings were the methodology I was describing. While entirely likely that I presented the purpose of the paper poorly, being a freshly minted PhD with limited publication experience, I remember the confusion I felt around the limited expectation of the reviewers.

Methodological papers are still a rarity, despite the slightly increased popularity that I saw during the COVID lockdowns. Most researchers that I encounter still see the typical paper of introduction-literature review-methods-results-discussion as the only format worth putting out into the world. And as is the case in any one-size-fits-all approach, much is lost by this homogeneity.

Research and the people who work in research are anything but homogenous. I have seen all manner of opinions of what counts for science, what data are, and ways of engaging with their craft. I’ve known researchers who are interested in the broad and the narrow, the individual and the collective, the future and the past. Boxing this variety into a homogenous communication is in this day-and-age, down-right daft.

We are in a wonderful age that strives to see diversity as a celebration. The time has come to celebrate the diversity in our research as well. To recognise that the typical paper format is perfectly fine, but researchers are not restricted to it. Sharing code, protocols, data, any of the ingredients of our research is one way that we can live our diversity, upholding a value that has become global.

Thanks to Katie Appleton and Gareth Cole for insightful comments on early drafts.

The views and opinions of this article are my own and do not reflect those of the University…although hopefully they do reflect Loughborough University values.