Content from Introduction
Last updated on 2024-05-12 | Edit this page
Why is this important?
The topic of reproducible research is important for several reasons:
- Making research more reproducible contributes to continuous research improvement
- Recent discussions of reproducibility crisis, that is many research studies being irreproducible, call for the need to broader awareness and increased skills in research reproducibility
- Open and reproducible research are becoming the norm around the world following many global changes in how academic research is conducted, including a broad science reform around the world
- Transparent, rigorous and reproducible research is an integral part of research integrity and responsible research conduct that researchers should follow
To summarize, more reproducible research means: research improvement, mitigating reproducibility crisis, contributing to ongoing science reform and responsible research conduct.
Content from What is Reproducible Research?
Last updated on 2024-05-13 | Edit this page
Overview
Questions
- What do we mean by reproducibility?
- When is research reproducible?
- Does reproducibility mean different things in different disciplines?
Objectives
- Explain what research reproducibility is
- Provide examples where reproducibility is not the same as open science (or: does not overlap with)
- Explain how different disciplines define reproducibility differently
Reproducibility: Some Definitions
Reproducibility: Obtaining the same results using the same data.
Replicability: Achieving similar results with new data.
Research is reproduced when results are consistent when following the same method and analysis steps with the same input data
Research is replicated when results are consistent across studies that answer the same research question, each of which has obtained its own data
Research results are generalized when results apply in other contexts or populations that differ from the original one
Based on what we went through, we can say that a study has been reproduced when:
- Researchers apply similar methods to the original study in a new study
- Researchers re-analyze data from the original study and observe the same results
- Researchers reuse data from the original study for a new purpose
- Researchers re-analyze data from the original study and observe the same results
Reproducibility: Some Examples
Let’s consider an example: a researcher is tossing a coin 100 times to check if the coin is fair. They register if they have observed heads or tails after each toss when the coin falls on the floor. Heads are registered as 0 and tails as 1. The sample size of this study is N = 100 (since they are tossing the coin 100 times).
Hypothesis: The coin is fair (i.e. not biased)\
Sample size: N = 100\
Heads = 0\
Tails = 1\
Analysis method = Student t-test\
After all data is collected (i.e. the researcher is done with the tossing) they start data analysis. They run a simple statistical test in the SPSS program - a Student t-test - to compare the number of observed tails outcomes against the chance level (which is 0.5 since the coin has two sides, and if it’s fair, there should be a 50% chance of getting tails). The researcher observes that the number of tails they got is no different from chance - and so they found a support for their original hypothesis. The researcher makes the complete data table and detailed methods and analysis from the study available to the public.
Another researcher downloads the data table and re-runs the exact same analysis in a different software using R programming language. They also observe that the number of tails is no different from chance. They have reproduced the study!
A third researcher reads about the reproduced study and decides to conduct a new data analysis on a different coin. They apply the exact same methods (i.e. they toss a coin 100 times and register the outcome every time the coin falls on the floor). Just as in the original study, they mark heads as 0 and tails as 1. They also run a Student t-test on the data and they observe that the number is tails is no different from chance. They have replicated the study!
Note, however, that in many different disciplines the word “reproduced” could be used in both the second and the third researcher case, that is to mean both reproducing and replicating the study.
Discussion
Can you provide additional examples of reproducible studies from various disciplines or research types?
Reproducibility Across Methodologies and Research Disciplines
Quantitative Studies: Computational Reproducibility
It is defined as “obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis” (https://www.nap.edu/catalog/25303). What it means is basically re-running analyses/code with the same data.
Qualitative Studies: Process Transparency
Here we mean arriving at a similar (consistent) interpretation by following the same analysis process. This could be obtained by following the step-by-step reasoning and interpretation process of the researcher(s).
Discussion
Discuss in pairs: Should we use the term “reproducibility” across different disciplines and research methodologies even though it might mean different things?
Reproducibility and Open Research
Reproducibility is closely associated with transparency.
In order to reproduce others’ studies we need to have access to the methods, data, and analyses that have been conducted. So making data, tools and analyses available is essential for reproducibility.
Reproducible does not (have to) mean fully open.
However, a reproducible project does not have to be fully open. For example, due to privacy or copyright restrictions on methods, data, or analyses, researchers might need to keep parts of the research outputs under controlled access (e.g. available only to other researchers and not publicly available). This should not prevent then, though, from making the project fully reproducible (e.g. internally within their research team).
Open does not mean reproducible.
On the other hand, it is entirely possible to practice open science without following reproducibility principles. Materials, data, tools and code can be made openly available but if they don’t have necessary documentation, instruction on how to use them, error checks, proper versioning and organization - they are most probably not usable, and the project might not be reproducible.
Reproducibility Crisis
Problems with reproducibility of research have been noticed by many researchers, advisors and policy makers in the past several years and led to some even claim that there is a “Reproducibility crisis”. However, not everyone agrees.
https://www.pnas.org/doi/full/10.1073/pnas.1708272114 https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-022-05942-3
Reasons for Irreproducibility
- Unavailability of materials, data and/or analyses
- Poor data management
- Unclear analysis specification
- Lack of documentation
- Errors in reporting numbers
- Lack of quality checking procedures
- Insufficient peer review
Discussion
Do you agree that there is a reproducibility crisis in academic
research?
How many studies would have to reproduce successfully for the “crisis”
to be over?
What could librarians do to help researchers fight the “reproducibility
crisis”?
Key Points
- Reproducibility usually means obtaining the same results with the same data.
- Across different disciplines and methodologies, the understanding of what reproducibility means can be very different.
- Reproducible research is not the same as open research - it is important to share research outputs to be able to reproduce others’ studies, but research can be made fully reproducible even if it cannot be made fully open.
- Recent studies point to many issues with reproducibility across different disciplines, something that has been termed “reproducibility crisis”
Content from Benefits and Challenges of Reproducibility
Last updated on 2024-05-12 | Edit this page
Overview
Questions
- How can science benefit from more reproducible research?
- How can students and researchers benefit from more reproducible research?
- What are the main challenges of making research more reproducible?
Objectives
- list at least 4 benefits of reproducible research
- link reproducible research to research integrity/ethical research principles (?)
- list at least 4 challenges of reproducible research
- provide one example for how researchers can be helped to overcome some of the challenges
Reproducibility benefits for science
Reproducible research is pivotal for research improvement because it ensures that studies are:
- Easier to verify, helping to catch errors and mistakes.
- More likely to be correct, as it increases the likelihood of catching issues.
- More understandable and reusable, due to the proper organization and thorough documentation of the research process.
- Better prepared to share and make open, when privacy or copyright restrictions do not apply.
Reproducibility benefits for researchers
But reproducibility also has particular benefits for researchers themselves, and not only for science more broadly. That’s because making studies more reproducible means that researchers:
- Are more efficient (although at first implementing reproducible workflows might be time-consuming, it makes them more efficient down the road!)
- Are less stressed about making a mistake (because both they and other researchers can check if the study reproduces across different contexts)
- Can get credit for producing rigorous research outputs (according to new research assessment criteria that follow global science reform)
Discussion
Think about one way in which more reproducible research could benefit science and one in which it could benefit researchers themselves. Why are these two benefits important?
Challenges in making research more reproducible
We learned that they are many benefits of making research more reproducible. However, this does not come without specific challenges:
- Making research reproducible is time-consuming (especially at first when new workflows are being implemented or research documentation is created from scratch)
- It requires skills and expertise (for example, researchers might need to know how to properly organize and document research outputs)
- It is more difficult when specific restrictions apply (in cases when due to privacy or copyright restrictions, critical parts of dataset or analysis cannot be made available for independent reproduction by other researchers)
- Research might not reproduce due to technical issues (for example, different analysis software might differ in how they perform certain calculations and produce different outputs for the same analysis)
Overcoming challenges
These specific challenges can be overcome, but they require that researchers have proper conditions and support for making studies more reproducible:
- Because making research reproducible is time-consuming, researchers should be rewarded for preparing reproducible outputs and have appropriate support
- Because it requires skills and expertise, institutions should offer training about the tools and solutions for reproducible research and/or trained support staff
- Because is more difficult when specific sharing restrictions apply, reproducibility should be checked by internal staff and/or proper infrastructure and tools with controlled access to research outputs should be in place
- Because research might not reproduce due to technical issues, software documentation should be provided and results could be checked across different types of software and operating systems
Discussion
Name one challenge of making research more reproducible.
Discuss with the person next to you your choice and brainstorm ways in
which you could help researchers overcome that challenge as a
librarian.
Key Points
- Making research more reproducible contributes to general research improvement, quality and rigor but also to higher efficiency and easier error correction for researchers
- This does not come without specific challenges, such as time constraints, technical issues and the need of specific skills for making research outputs reproducible
- To overcome the challenges, researchers are in need of proper tools, solutions, research support staff, infrastructure and training
Content from Tools for Reproducible Research Workflows
Last updated on 2024-11-15 | Edit this page
Overview
Questions
- What are reproducible research workflows?
- Which areas of the research process can be made more reproducible?
- What tools can improve reproducibility of research workflows?
Objectives
- provide examples of reproducible research workflows
- list at least 4 different tools and practices for increasing research reproducibility
- demonstrate basic understanding of how to use selected tools for increasing research reproducibility
Starting with Reproducible Research Workflows
When we talk about research workflows we mean the sequence of processes through which researchers have to go to get to specific research outputs such as a dataset, analysis result or a publication. We can distinguish three main areas in the research process where workflows can be made more reproducible:
- Data acquisition and processing
- Data analyses
- Data reports (manuscripts)
What tools are available out there?
Different tools can be used for increasing reproducibility depending on the specific phase of research process. Here is a list of some helpful tools for each of the three phases:
Data Acquisition and Processing
Documentation is a critical step in ensuring data acquisition and processing are transparent and reproducible. Key tools include:
-
README files: Provide essential metadata about
datasets or code repositories, such as purpose, data collection methods,
and file organization. A template is
available to guide their creation.
-
Codebooks: Define dataset variables, labels,
values, and units to make data understandable and reusable.
- Electronic Lab Notebooks (ELNs): Digital tools to document lab workflows, experiments, and results with features like timestamps and collaboration (e.g., Jupyter, LabArchives).
Data analyses
In the data analysis phase of the research process, the tools for making analyses more reproducible will differ depending on the methodology used, for example depending on whether researcher applies quantitative or qualitative methods in the study. Here is a list of some helpful tools depending on research methodology:
Quantitative methods:
- Programming languages for track-record and transparency of all steps: R, Python, Syntax from SPSS
- Versioning: Git
- Code quality checking and testing: https://the-turing-way.netlify.app/reproducible-research/code-quality.html
- Containers for freezing computational environment: https://the-turing-way.netlify.app/reproducible-research/renv/renv-options.html
- Code capsules: https://codeocean.com
Qualitative methods:
- Annotations (e.g. annotation function in NVIVO, ATI: Annotation for Transparent Inquiry)
- Active citation: https://www.princeton.edu/~amoravcs/library/ps.pdf
Data reports (manuscripts)
Tools to create, share, and collaborate on data reports.
-
R Markdown: Create reproducible reports by
embedding code and outputs into narrative documents. Ideal for
integrating R workflows.
-
Quarto: A next-generation tool similar to R
Markdown but with expanded support for multiple languages (e.g., Python,
Julia, R).
-
Jupyter Notebooks: Interactive documents combining
live code, outputs, and text, popular for Python workflows.
-
HackMD: A collaborative markdown editor for
real-time co-authoring, not inherently reproducible but useful for team
writing.
- Overleaf: A LaTeX-based platform for professional typesetting, useful for collaboration but not designed for reproducibility.
Exercise
- Take a look at the README file template that we listed in this lesson: https://data.research.cornell.edu/data-management/sharing/readme/ How could you help a researcher fill out a template like that? Which elements could you help most with?
- What types of tools can be used for making qualitative analyses more reproducible? If you or a researcher don’t have access to these specific tools, could you think of other ways in which one could make qualitative analysis more reproducible using commonly available tools?
Key Points
- Research workflows are sequences of processes that researchers have to go through to get to specific research outputs
- Data acquisition, data analysis and manuscript writing are three phases of the research process that can be made more reproducible
- There are many tools out there that can help make research workflows more reproducible
Content from The Role of Libraries in Supporting Reproducibility
Last updated on 2024-05-13 | Edit this page
Overview
Questions
- What is the role of libraries in research improvement?
- How can library staff support researchers in improving reproducibility?
Objectives
- demonstrate understanding of libraries’ central role in supporting reproducibility
- provide examples of how libraries can support research reproducibility and its improvement
Why Libraries?
Academic libraries are uniquely positioned to provide support in the area of reproducible research. Open science is already at the core of many libraries’ work and many librarians provide direct and increasingly hands-on support to both early career and senior researchers at their institutions. Because reproducibility is strongly associated with open science/open research and because funders, journals, and other stakeholders begin to implement new requirements for not only open but also reproducible research, librarians can build on their expertise in helping preparing research outputs for sharing to also help make them reproducible.
Reproducibility support from the libraries
Where can libraries help:
- Awareness raising, teaching, training, and hands-on guidance
- Help researchers being transparent about their full research workflow: research questions, methods, data, step-by-step procedures and analyses; help making the methods, data and analyses open (if possible in a given project)
- Assist with making good documentation for all outputs and all stages of the research process
- Assist with version control to track of versions of all outputs
- Help with the quality check of the methods, data and analyses
- Verifying researchers’ work: helping with reproducing their own results
Exercise (Reflection)
What do you think is the most important area where libraries can provide support in making research more reproducible?
Key Points
- Academic libraries are uniquely positioned to provide help and support in the area of open and reproducible research
- Academic librarians can build on their expertise in making research outputs shareable and reusable to also help make them reproducible
- Academic libraries can offer training in reproducible research workflows in addition to open science trainings