Strengthening Reproducibility in Network Science

NetSci 2017 Reproducibility Workshop

Welcome

As a half-day satellite of the NetSci2017 conference, the workshop will be held on Monday afternoon, June 19, 2017 in Indianapolis, Indiana, USA. It will feature invited talks from publishing, funding, industry and academia perspectives, followed by a panel discussion and with audience participation that aims to identify actions the community can engage in to strengthen scientific rigor in the field of network science. The details can be view on the workshop program. All attendees of this symposium must be registered for the NetSci2017 conference in order to attend.

Invited Speakers / Panelists

Image of Philip E. Bourne Ph.D.

Philip E. Bourne Ph.D.

Former NIH Associate Director for Data Science, led the Big Data to Knowledge (BD2K) initiative. He will be joining the University of Virginia as the Stephenson Chair of Data Science, Director of the Data Science Institute, and Professor in the Department of Biomedical Engineering.

Image of Santo Fortunato Ph.D.

Santo Fortunato Ph.D.

Professor of Informatics and Computing at Indiana University, Scientific Director at Indiana University Network Science Institute.

Image of Barbara R. Jasny Ph.D.

Barbara R. Jasny Ph.D.

Deputy Editor, Emeritus at Science magazine. Experience in ensuring reproducibility of scientific research in publication.

Image of Hao Ma Ph.D.

Hao Ma Ph.D.

Researcher and Manager at the Internet Services Research Center (ISRC), Microsoft Research. He is the leader of the team that creates the data powering Microsoft Academic.

Image of Paul Macklin Ph.D.

Paul Macklin Ph.D.

Associate Professor of Intelligent Systems Engineering at Indiana University. Expert in multiscale computational models of cancer and other diseases.

Image of Brandon C. Roy Ph.D.

Brandon C. Roy Ph.D.

Data Science manager at Twitter. Leads the Twitter Science team in Boston. Expert in Computational Cognitive Science and natural language processing.

Image of Richard Shiffrin Ph.D.

Richard Shiffrin Ph.D.

Distinguished Professor and Luther Dana Waterman Professor of Psychological and Brain Sciences at Indiana University. Expert in Bayesian statistics and reproducibility of research in general.

Image of Brandon Thorpe Ph.D.

Brandon Thorpe Ph.D.

Project Manager at Center for Open Science.

Program (June 19th)

2pm - 2:15: [Yan] Provide a rationale for and overview of the workshop. Topics: define types of reproducibility; how each talk fits into the framework. Provide goals for the workshop.
Slides
2:15 - 2:40 [Jasny] Overview of reproducibility in general. Topics: the reproducibility crisis, how other fields of science is addressing the issue, publisher’s perspective.
Slides
2:40 - 3:05: [Bourne] reproducibility in general data science, and NIH’s efforts. Topics: what has worked what hasn’t, funder’s perspective. Advice on where to look for successful models of addressing reproducibility, anticipated challenges, and ideas for next steps.
Slides
3:05 - 3:30: [Roy] Twitter data as an exemplar of issues of statistical and empirical reproducibility. Topics: applying statistical modeling and observational data analysis to Twitter data. How statistical methods translate to empirical reproducibility in industry products.
Slides
3:30 - 3:55: [Ma] Microsoft Academic Graph, as an exemplar data set with computational reproducibility challenges. Topics: open big data and sharing, data quality and cleaning. The design philosophy, the vision to promote reproducible research through superb and readily available academic discovery and recommendation services.
3:55 - 4:05: Coffee break. Microsoft team demo.
4:05 - 4:30: [Macklin] Problems (and early solutions) in reproducibility for multicellular systems biology.
Slides
4:30 - 4:55: [Thorpe] The Open Science Framework and what the Center for OpenScience has done in to promote and support reproducibility.
4:55 – 6:10: [Panel: Jasny, Bourne, Fortunato, Shiffrin; Moderator: Mabry] Lessons learned: community building toward replicable and sharable network science, from publishing, funding, industry and academia perspectives. Dialogue among the audience and the panelists.
6:10 - 6:30: [Mabry, Yan] Action items and future plans. Call for contributions for an article, follow up meetings, etc. Live doc via Google Docs

Background and Rationale

The holy grail of science is truth, and at the heart of science is rigor. In the past few years growing attention has been place on revamping prevailing scientific practices in order to put greater emphasis on transparency and reproducibility which underpin rigor. Leading the way are scientific journals and funding agencies who in collaboration with organizations devoted to improving scientific rigor, have developed new guidelines and requirements for authors and grant applicants. Against this backdrop, network science is emerging as a transformative, interdisciplinary field. To succeed, network science must demonstrate that it is committed to scientific rigor as a core value.

Broadly speaking, reproducibility can be categorized into three subtypes, namely, Computational reproducibility, Empirical reproducibility and Statistical reproducibility. They will serve as an organizing theme for the workshop:

Computational reproducibility (complete information about code, software, data set and implementation details) – Large data sets and algorithmic tools are pervasive in network science, indicating a pressing need for attention to computational reproducibility. When attempting to computationally reproduce another’s study, one typically relies on the open availability of the original data and code. This becomes a technological challenge when it reaches the scale of “big data”. For example, bibliographic data is fundamental for understanding the progress of scientific discovery. It is, however, a vast intertwining webs of coauthor and citation networks which is also subject to updates every day. As a result, even simple tasks like indexing and provenance require special infrastructures and community effort. Other methods that aim to improve statistical/empirical reproducibility also relies on computational tools.

Empirical reproducibility (complete information that is required to achieve empirical generalization. Includes all non-computational aspect of data collection and data quality issues as well as experiment design) – Empirical reproducibility issues arise from data collection and experiment design. For example, the Twitter open streams is a popular social media source for computational social analysis, leading to a large number of research papers produced on the dataset. How much of these findings are reproducible in real social systems? How much of the methods can be generalized and used in industry products? How can statistical reproducible methods translate into empirically reproducible results? How can we improve the Twitter open streams as a data source and better understand its research value?

Statistical reproducibility (complete information about the choice of statistical model, methods and how training and testing data are treated) – While issues like p-value hacking is less relevant in exploitative studies, general techniques like per-registration of hold data can still be useful to network science. Additional challenges to computational reproducibility arise when data cannot be openly shared. For network science, privacy protection is further complicated. Standard methods like anonymizing identities is not secure - the relational nature of network data may mean that identities can be discovered more easily when combined with external data sets. Therefore, innovative statistical tools, like network differential private algorithms, are needed to make it possible to reproduce aggregate statistics while preserving individual privacy.

Through this workshop, we will address the different types of reproducibility and the challenges and solutions from a network science perspective. Workshop attendees can expect to: 1) get a broad overview of the importance of reproducibility in the scientific process; 2) an understanding of what select funding agencies and journals are doing to ensure reproducibility; 3) learn about the three types of reproducibility and how they relate to two data sets used by network scientists: social media data (Twitter) and an open data set, Microsoft Academic Graph; 5) learn what early solutions the biomedical community has developed and the infrastructure has been developed by the Center for Open Science to promote reproducibility; and 6) participate in discussions with our panelists to identify actions the network science community can engage in to address reproducibility. The result will be an interdisciplinary group of network scientists, galvanized to bring scientific rigor to our field and poised to carry out the action items generated during the workshop.

Organizers

Santo Fortunato Ph.D.

Professor of Informatics and Computing at Indiana University, Scientific Director at Indiana University Network Science Institute.

Patricia L. Mabry Ph.D.

Executive Director and Senior Research Scientist at Indiana University Network Science Institute. Former Senior Advisor for Disease Prevention in the Office of Disease Prevention (ODP) at NIH.

Kuansan Wang Ph.D.

Principal researcher and managing director of Microsoft Research Outreach.

Xiaoran Yan Ph.D.

Assistant Research Scientist at Indiana University Network Science Institute.

Sponsors

Link to Microsoft Research
Link to Microsoft Research