Strengthening Reproducibility in Network Science

NetSci 2017 Reproducibility Workshop

Welcome

As a half-day satellite of the NetSci2017 conference, the workshop will be held on Monday afternoon, June 19, 2017 in Indianapolis, Indiana, USA. It will feature invited talks from publishing, funding, industry and academia perspectives, followed by a panel discussion and with audience participation that aims to identify actions the community can engage in to strengthen scientific rigor in the field of network science. The details can be view on the workshop program. All attendees of this symposium must be registered for the NetSci2017 conference in order to attend.

Invited Speakers

Image of Philip E. Bourne Ph.D.

Philip E. Bourne Ph.D.

Former NIH Associate Director for Data Science, led the Big Data to Knowledge (BD2K) initiative. He will be joining the University of Virginia as the Stephenson Chair of Data Science, Director of the Data Science Institute, and Professor in the Department of Biomedical Engineering.

Image of Lise Getoor Ph.D.

Lise Getoor Ph.D.

Professor of computer science at the University of California, Santa Cruz, expert in entity resolution and relational statistical methods.

Image of Barbara R. Jasny Ph.D.

Barbara R. Jasny Ph.D.

Deputy Editor, Emeritus at the Science Journal. Expert in reproducibility of scietific research.

Jan Pedersen Ph.D.

Former Technical Fellow and Chief Scientist, Core Search at Microsoft, with a distinguished career from Alta Vista and Yahoo. Recently Joined Twitter.

Image of Richard Shiffrin Ph.D.

Richard Shiffrin Ph.D.

Distinguished Professor and Luther Dana Waterman Professor of Psychological and Brain Sciences at Indiana University. Expert in Bayesian statistics and reproducibility of research in general.

Background and Rationale

The holy grail of science is truth, and at the heart of science is rigor. Through careful observation, experimentation with rigorous methods, we are able to drawn inferences and derive knowledge that we rely on as truth. Each new study is a building block of science, taking as fact the results of prior studies. In this way, the foundation of science is built, painstakingly, brick by brick. In the past few years, the quality of this foundation, on which scientific advancement stands, has come into question because of systematic failure of the scientific enterprise to prioritize reproducibility. The ramifications of this failure are dire, forcing not only a reconsideration of the veracity of nearly all scientific evidence and knowledge to date, but also calling into question the prevailing scientific practices of our time. Empirical studies of reproducibility have reflected a variety of flaws in the scientific enterprise that contribute to this state of affairs including the incentive structure of academia, editorial bias toward novel and surprising findings and lack of interest in replication studies, inadequate sharing of information necessary to reproduce, and more. Recognizing the dire state of affairs, the scientific community has begun to come up with ways to remedy or at least mitigate the situation. Leading the way are scientific journals and funding agencies who have issued new guidelines and requirements designed to improve reproducibility and rigor.

Against this backdrop, network science is emerging as a transformative, interdisciplinary field. To succeed, network science must, above all else, stand for scientific rigor. Reproducibility standards fall into three levels: computational reproducibility, statistical reproducibility and empirical reproducibility. Computational reproducibility applies to all network science because data and algorithmic tools are pervasive. When attempting to computationally reproduce another's study, one typically relies on the open availability of the original data and code. This becomes a technological challenge when it reaches the scale of "big data", where simple tasks like indexing and provenance require special infrastructures. Twitter data, for example, which is a treasure trove of social media data, is subject to updates and deletions every second. So, depending on when the data is accessed, even if it covers the same time period, the data within may be different.

Another challenge is when data is private. Standard methods to anonymize the data before sharing could have exaggerated implications in network science - the relational nature of network data may mean that identities can more easily discoverable when combined with publicly available data sets. Therefore, innovative new ways of privacy preservation may be needed for sharing data. The idea of statistical reproducibility, which addresses the P-hacking problems in traditional studies, has a special application in network science, leading to the network differential private algorithms. For empirical reproducibility, a good example would be the disambiguation problem and other data quality issues in networks. Bibliographic data is fundamental for understanding the progress of scientific discovery, forming vast intertwining webs of coauthor and citation networks ripe of mining. Thanks again to their relational nature, more powerful collective entity resolution algorithms can be built to improve data quality. While cleaner data naturally leads to better empirical reproducibility, these data processing pipelines also create new challenges in their computational reproducibility.

Different branches of network science, depending on their domain and developmental stage, have different implications in terms of reproducibility. In this satellite, we plan to discuss the common issues as well as domain specific problems. Through this workshop, participants can expect to: 1) get a broad overview of the importance of reproducibility in the scientific process, current state of the field and what funding agencies and journal are doing to ensure reproducibility moving forward; 2) learn about the reproducibility issues in two data sets used by network scientists: social media data (Twitter) with privacy and provenance concerns, and an open data set in Microsoft Academic Graph. 3) learn what has been done in related domains in terms of promoting reproducibility and discuss common concerns and guidelines for network science in general. 4) participate in discussions with our panelists to identify actions the network science community can engage in to address reproducibility in specific domains, strengthen scientific rigor, comport with industry, journal and funding agency practices and guidelines. The result will be an interdisciplinary group of network scientists, galvanized to bring scientific rigor to our field and poised to carry out the action items generated during the workshop.

Organizers

Santo Fortunato Ph.D.

Professor of Informatics and Computing at Indiana University, Scientific Director at Indiana University Network Science Institute.

Patricia L. Mabry Ph.D.

Executive Director and Senior Research Scientist at Indiana University Network Science Institute. Former Senior Advisor for Disease Prevention in the Office of Disease Prevention (ODP) at NIH.

Kuansan Wang Ph.D.

Principal researcher and managing director of Microsoft Research Outreach.

Xiaoran Yan Ph.D.

Assistant Research Scientist at Indiana University Network Science Institute.

Sponsors

Link to Microsoft Research
Link to Microsoft Research