On my first trip to Chicago to introduce the idea of Bold Public Code, I learned that “reproducibility” was a big deal. A measure of “reproducibility” is how long it takes you to recreate an experiment and get the same results described in a research paper. If it takes you hours, then that’s probably OK. If it takes you months, you’re going to be pretty annoyed. If you’re the average time poor computer science PhD student, you’re going to be downright angry.
What I found out was that when software wasn’t included with research papers, results could take months to reproduce. When software was included but wasn’t packaged nicely in a container or a virtual machine, it could take days to weeks just to understand the software well enough to build and execute it. This was something that resonated with everyone I sat down with on that first trip, 100% of them.
For Bold Public Code as an organization, “reproducibility” matters a lot:
- When we see research code getting reused for any reason, that’s a useful signal.
This signal tells us something about how much interest there is in a research topic or a research approach. And we want to encourage reuse so we get better signals for choosing projects to support.
- We want researchers to spend more time coming up with interesting new solutions and less time recreating old ones.
That increases the likelihood that someone comes up with new research that resonates with people and has a big impact on the world. And that’s the type of research that Bold Public Code can help the most.
So through Bold Public Code, I’m leading a project to create documentation and tools to help researchers package their code nicely in containers and virtual machines. This fits into our plan to solve research methodology problems that students and faculty tell us about.
This project fits nicely into an existing initiative: the ACM’s Artifact Review and Badging. We’ll be working closely with people already involved in that initiative to test the tools we come up with.