Towards Reproducibility of Scientific Results in Software Engineering

Open Source, Open Data, but no Reproducible Scientific Results?

There is a very interesting paper in the preface of the last SPLASH/OOPSLA conference by Martin Rinard from MIT:

OOPSLA/SPLASH 2010 Program Chair Statement http://bit.ly/d7OcK7 (pdf, p.iv)

Among the thoughtful reflections by OOSPLA/SPLASH program chair on conference reviewing processes, one remark of particular interest is about the serious lack of reproducibility in the software engineering community. Quoting Martin Rinard about the SPLASH/OOPSLA reviewing:

The ability to reproduce the presented results is a fundamental property of scientific publication. Given the complexity of the systems that many computer scientists work with, the fact that many of the components in such systems change rapidly (with older versions quickly becoming effectively unavailable), and the strict page limits that conferences impose, reproducibility has always been a problematic issue for the field [5]. Specific issues include whether or not the paper presents sufficient detail for another researchers to reimplement the system (almost all do not, given the complexity of the systems involved), whether the researchers make there implementation issue moot by making their data and/or artifacts publicly available via dissemination mechanisms such as the Internet (many but by no means all researchers do this), or whether, given the speed with which the field moves, reproducibility is even a relevant goal or not for most papers. The program committee did not attempt to address this issue in any detail other than to simply apply prevailing reviewing standards.

One issue did, however, come up during the program committee meeting. The computer science research community in general, and the programming languages/software engineering community in particular, currently has strong participation from communities (for example, researchers working in industrial research labs) that often work with proprietary software systems. Because such systems are not available to the broad research community, it is in general not possible for others in the community to reproduce the reported results — the data and/or artifacts required to reproduce the results are not available (and will never become available) to others. While the program committee was not entirely comfortable accepting papers that present such results, it was at least as uncomfortable with requiring industry researchers to present only results that others could reproduce.

Given the central importance of reproducibility to scientific inquiry and the current ambiguous status of reproducibility in the field, I think it is important for the field to come to amore explicit understanding of what degree of reproducibility is acceptable. In particular, I believe the field needs to come to a decision on the acceptability of results that others are inherently unable to reproduce because of the proprietary nature of the relevant data and/or artifacts.

[5] T. Mudge. Report on the panel: How can computer architecture researchers avoid becoming a society for irreproducible results? Computer Architecture News, 24(1), 1996.

This is finely observed and nicely presented, but what can we do in practice to help our discipline reach the level of nearly all other scientific disciplines? Until we take the necessary steps to ensure that self-proclaimed results may be checked against strong and openly available experimental data, we can difficultly claim, in academic committees and circles, that computer science and software engineering has reached the level of maturity of other more mature engineering fields.

I remember a discussion in the PC of a software engineering conference on this subject. Someone proposed that any paper claiming novelty on the basis of experiments should provide a stable pointer to the data that was used to produce the results. Then the discussion moved on how to ensure stability and transparency in accessing these experimental data necessary to show reproducibility of results. There was a claim that the conference organizers (was it ACM, IEEE or another organization, I don’t remember well) could provide this neutral access to stable experimental data. Suprizingly the initial answer of the representatives of the organizers was that they were not opposed at all to organizing this, for example as a service to their members. The funny story is that a small number of people started then claiming this a bad idea because it was unfair. And do you know what their arguments were? This way of doing would be unfair because it would shed suspicion on papers that were not providing the reproducible data to reviewers. It would give a too important advantage to the papers submitted to reviewing together with the reproducible data. The discussion was then stopped.

Our discipline claims respectability because we are using peer-revieving in our journals and conferences. But we all know how fragile this argument is. We will reach the maturity level of other disciplines when we have achieved a sound process for reproducible results in software engineering. There is no excuse in 2010, with the broad web access, the low-cost of disk storage and the knowledge and experience accumulated in the open source and open data communities, to escape this necessary sanity move. All conference organizers or PC chairs should state in their call for papers, how the experimental data for non theoretical could be stored and retrieved.

The good news is that some colleagues in the community are already organizing themselves for a move in the good direction.

Pieter Van Gorp and his colleagues for example are already organizing the campaign for reproducible results in software engineering. Read their paper on “Supporting the internet_based evaluation of research software with cloud infrastructure” at: http://www.springerlink.com/content/y1488178640l6412/

Read also Pieter’s answer to the “Where are the zoos” blog at: https://modelseverywhere.wordpress.com/2010/11/06/where-are-the-zoos/trackback/

This entry was posted in Reproducibility, Software Engineering. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s