Biostatistician Community Webinar – Databases in the Tidyverse

November 15 @ 1:00 pm - 2:30 pm

Presenters: Ben Baumer, Smith College Program in Statistical and Data Sciences and Nicholas Horton, Amherst College Department of Mathematics and Statistics
Date and Time: Wednesday, November 15, 2017, 1:00 p.m. – 2:30 p.m. EST
Webinar location: Bloomberg School of Public Health, 6th floor, Room E6519

The dplyr package within R provides a flexible and powerful syntax for data wrangling operations. However, data objects within R are typically stored in memory and performance issues may arise as data become large. Database management systems implementing SQL (structured query language) provide a ubiquitous architecture for storing and querying data that is relational in nature. While there has been support for data retrieval in R from relational databases such as MySQL, SQLite, and Postgres, recent advances that have added interfaces between R and SQL enable users to seamlessly leverage storage and retrieval mechanisms while remaining within R. In this webinar, we will review key idioms for data wrangling within dplyr, introduce the backend interfaces for common database systems, provide examples of ways that the dplyr engine translates a data pipeline, and discuss common misconceptions and performance issues.

About the Presenters:
Benjamin S. Baumer is an assistant professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Modern Data Science with R, and won the 2016 Contemporary Baseball Analysis Award from the Society for American Baseball Research.

Nicholas Horton is Professor of Statistics at Amherst College, with methodologic research interests in longitudinal regression models, missing data methods, and statistical computing. He graduated from the Harvard TH Chan School of Public Health in 1999. Nick has received the ASA’s Founders Award, the Waller Education Award, the William Warde Mu Sigma Rho Education Award, and the MAA Hogg Award for Excellence in Teaching. He has published more than 160 papers, co-authored a series of four books on statistical computing and data science, and was co-PI on the NSF funded MOSAIC project. Nick is a Fellow of the ASA, served as a member of the ASA Board, chairs the Committee of Presidents of Statistical Societies and is the past-chair of the ASA Section on Statistical Education. He is a member of the National Academies of Sciences Committee on Applied and Theoretical Statistics and two Academy studies on undergraduate data science education.


