Data Sciences at Duke

In addition to the computational training and resources that DUPRI provides, Duke has numerous resources for those interested in gaining data science, machine learning, and statistical computing skills. These offerings include online tutorials, in-person workshops, and access to computing infrastructure.

SSRI and the Center for Data and Visualization Sciences at Bostock Library provide ongoing workshops on various topics including data management, data visualization, and data analysis. CDVS also provides online tutorials, videos and guides for past workshops, walk-in consulting, and the Brandaleone Lab for Data and Visualization Services, which provides computing and specialized software for research.

Duke Machine Learning is a community of researchers and students in departments, organized with the Rhodes Information Initiative. They facilitate talks and workshops on machine learning and big data topics across campus.

The Roots Program, part of the Innovation Co-Lab, provides workshops on technology topics such as programming, statistical computing, and web design.

Ph.D. students, postdoctoral researchers, and junior faculty who are interested in more in-depth training in computational social science are encouraged to apply to the Summer Institute in Computational Social Science. This 2-week program is held at multiple locations around the world—including Duke. The instructional program includes lectures, group problem sets, and participant-led research projects, and outside speakers who conduct computational social science research in a variety of settings, such as academia, industry, and government. The program also provides open-source training videos and materials for online learning

The Duke Compute Cluster provides high-performance computing hardware and software for research purposes. It is particularly useful for researchers who employ big data or complex models that would benefit from distributed computing.

Jerome Reiter, Department Chair and Professor of Statistical Science, presents an integrated system for data access designed to share data in ways that protect privacy and confidentiality, in which data stewards generate and release synthetic data, that is, data simulated from statistical models, while also providing users access to a verification server that allows them to assess the quality of inferences from the synthetic data. In the presentation below, Reiter presents an application of the synthetic data plus verification server approach to longitudinal data on employees of the U.S. federal government.

Benjamin Goldstein, Associate Professor of Biostatistics & Bioinformatics, Duke Clinical Research Institute, Children's Health & Discovery Initiative, and DUPRI Research Scholar delivers a comprehensive talk on Electronic Health Records available though the Duke University Health System on May 13, 2020 to Duke University and Health System researchers, “Working with EHR Data from Duke University Health System: What is it and How Do I do it?"

Benjamin Goldstein on Duke Electronic Health Record Data