Yes, you can open commercially funded research (but probably not all of it)

Image description: Two LEGO figures, in white shirts and blue trousers, discussing a model of a LEGO shark. Image created with Bing’s AI Image Creator: https://www.bing.com/images/create/

“Mommy, how much do you think the Jaws LEGO set will cost?” asked my seven-year-old the other night as I was putting him to bed. His dad had told him earlier that day that the LEGO Review Board had officially announced that a build design, submitted by a LEGO fan, based on the 1974 novel by Peter Benchley, which was the source of the 1975 Steven Spielberg movie, was going to be in production. As my son is an ardent enthusiast of all things with big teeth, a LEGO Jaws set was the best news since he discovered that he could build a mosasaurus.

When companies like LEGO, IKEA, Unilever and NASA crowdsource design, innovations and problem-solving, it is termed open innovation. It is one example where a commercial venture works in synergy with open research practices. At Loughborough University, our researchers have worked with companies like Rolls Royce, Jaguar, Airbus, and Adidas. While some of the research outputs are embargoed due to commercial restrictions, there are parts of the research that can and is made openly available.

Negotiations of what can be made openly available should take place before the contract is signed. If you are a researcher who is working with commercial funding, be sure to check if your methods, data collection tools, and data (perhaps in an aggregated format) can be shared and under what conditions. In the discussions I’ve had, I have found commercial funders willing to work with researchers in opening as much of the research as is sensibly possible.

And thank goodness they are because, without open innovation, I wouldn’t be in the running for the best Christmas present award from my seven-year-old.

Data management – from a section in the grant proposal to a day-to-day reference manual 

By Krzysztof Cipora, Lecturer in Mathematical Cognition, Open Research Lead of the School of Science, Centre for Mathematical Cognition, Loughborough University, k.cipora@lboro.ac.uk, @krzysztofcipora

Most funding agencies require grant proposals to contain a data management plan. It may seem an extra burden to prepare yet another document, as all applicants have been handling research data and know how to do it, so why mandate such a technicality in the proposal? At the same time, many researchers have not been formally trained in data management. Open Research practices becoming more and more widely adopted (and more and more often mandated by funders) include Open Data, that is sharing research data, either in the public domain or granting access to other researchers. No matter whether shared publicly or with some restrictions, the data need to be understandable and usable. This requires the data to be thoroughly curated, documented, and at best to go along with the programming code used for its processing and analysis. Researchers also benefit from good data curation and documentation if they come back to their own data after a few months or years. Data management is even more critical in large-scale projects including many researchers, research assistants etc. However, the document typically supplied to the funder is relatively short (space restrictions!) and therefore quite generic, so it cannot fully satisfy the day-to-day data management needs. 

In June 2022 at Loughborough University, we launched £ 9 989 000 Centre for Early Mathematics Learning (CEML; ceml.ac.uk) funded within ESRC Research Centre scheme. Funding covers a period of five years and over twenty-five researchers from several institutions are involved within five CEML challenges. They are supported by several research assistants and PhD students. Various types of data are being gathered. One of the crucial issues at the CEML onset was to ensure we are on the same page with data management. It has been necessary for several reasons both within CEML and for future data sharing. The same variables should be named and coded consistently across studies to streamline the readability of the data, facilitate the re-use of analysis code, and collapse datasets if needed. In case a researcher is reallocated from one study to another, they can catch up easily. Keeping our data curated and consistent for ourselves also makes it more accessible to other researchers when we share it with the community. 

Together with other colleagues, I took on preparing a detailed Data Management Policy for the CEML. In the following, I briefly describe what we did and how this might be used as an example for other projects (including much smaller ones). 

We started from the Data Management Policy from the CEML proposal and elaborated on the details (see CEML Data Management Policy https://doi.org/10.17028/rd.lboro.21820752). The document first outlines the responsibilities of Challenge Leads and Leads of specific studies being run (this is particularly important given the number of researchers involved in CEML). It specifies where the data are stored, who should have access to the data (working together with researchers from outside Loughborough University required some thinking of how to set this up efficiently), and when the data can be shared publicly. The document also specifies how to document the data entry process and how to document data analysis to ensure analytical reproducibility. We also specify the process of creating backups and data sharing. 

The Data Management Policy refers to Variable Dictionary (see CEML Variable Dictionary https://doi.org/10.17028/rd.lboro.21820824) – a document providing detailed information on how to name and organise data files and how to name variables. We also provide a template for the meta-data file to be created for each study (see CEML Variable Dictionary Template https://doi.org/10.17028/rd.lboro.21820947).  

To make these materials more accessible to CEML colleagues, we prepared a short video highlighting the most important aspects of the CEML data management and justifying why such detailed guidelines have been prepared (see CEML Data Management Training Video https://doi.org/10.17028/rd.lboro.21820713). 

All these look quite elaborate and may not seem very useful for smaller projects. However, at least some of these points may be worth considering. It is worth remembering that “there will be at least two people working with your data, you and future you”. Thus, ensuring a consistent way of naming data files, their structure, variable names, the analysis code, and documenting the progress of data processing is a big favour to future you and, most likely, other researchers who may work with your data. Hopefully, the CEML documents linked may serve as a useful template on how to prepare a day-to-day data management reference for other projects.

The views and opinions of this article are the author’s and do not reflect those of the University…although hopefully they do reflect Loughborough University values.