Mark Yacoub is the Duke Population Research Institute’s (DUPRI) Director of Computational Resources. DUPRI is a dynamic community of over 70 research scholars dedicated to interdisciplinary population research and training.
Leading DUPRI research scholars James Moody, M. Giovanna Merli, Christopher Bail and Seth Sanders seek Yacoub’s assistance when they need expert support in statistical computing, virtual computing, data storage, and the analysis of text data.
“Mark is a highly skilled team player and an asset to DUPRI,” explains Moody, Professor in the Department of Sociology. “Computational science is a rapidly growing multidisciplinary field and Mark stays current in this field to help faculty understand, discover, or solve complex problems.”
Moody and Merli, Professor in the Sanford School of Public Policy, currently collaborate on a project to examine authorship trends in demography journals over the past 60 years. Merli, notes, “Yacoub’s assistance collecting, cleaning, and structuring bibliographic and authors’ demographic data has been invaluable. His work is always thorough and well thought-out with results delivered quickly and clearly.”
Merli also directs the Duke Population Research Center (DPRC). “Mark is a conduit of information for all DUPRI and DPRC scholars. He knows the dynamics of the centers and brings the right people together,” she said.
Yacoub realized his strengths and interests at a very young age. As a youth growing up in Elk Grove, a suburb of Chicago, Yacoub remembers spending hours tracking statistics of players for his hometown Chicago Bulls. “I would meticulously record how well the team performed when each combination of players was in the game and I entered the results in a spreadsheet. I was convinced I could uncover hidden player combinations to maximize the team’s potential that the coaches could not.” Yacoub eventually realized the information he collected and plotted was indeed his earliest “data analysis.”
Yacoub’s family later relocated to Raleigh, North Carolina. As an adult, Yacoub earned his graduate degree in Political Science from UNC Chapel Hill. His primary course of study was as a methodologist. He has been with Duke University and DUPRI for three years.
Bail, Professor of Sociology, highlighted that Yacoub is a tremendous asset to have on a research team. Bail worked with Mark on setting up DUPRI’s virtual computing infrastructure. “Mark worked tirelessly for close to a year learning the ins and outs of virtual computing on campus and building templates that effectively meet affiliates’ research needs. He provides great support and is very much attuned to the needs of users,” said Bail. “I and my graduate students have worked closely with Mark to use DUPRI’s virtual computing resources for data collection and data analysis of large-scale text data.”
One of Yacoub’s main responsibilities is the computational training component of DUPRI. He coordinates training in computational approaches several times per year. His workshops have focused on R, including a general training, text analysis, web scraping, and parallel processing. Currently under development are trainings on multilevel modeling, missing data methods, and data linking.
Workshop attendees always praise his materials, example code, and organization. (Training curricula, code, examples and other resources can be found in the DUPRI website.)
Seth Sanders, Professor of Economics at Trinity College of Arts and Sciences, remarked, “I attended the trainings for web scraping and text analysis and found both illuminating. Mark’s ability to navigate these new data sources is impressive."
Yacoub stays current with the latest developments in computational resources. He said an area for future development is the use of Spark, an open-source framework for cluster-based data analysis: "I believe this specific application will become increasingly important to population research as data sources become increasingly complex and the use of big data expands.”
Specific resources Yacoub can assist DUPRI faculty include:
Virtual computing/parallel processing: DUPRI has its own virtual machines for research purposes as well as templates on Duke Research Toolkits designed to aid in the analysis of big data.
Statistical computing/data analysis: Mark is an expert in R, text analysis and web scraping.
- Data gathering, parallel processing and data storage.