Data for Data Science
Description
Researchers face a growing data management challenge, starting with data collection and continuing through data analysis, publication, and archival. Potential problems research labs may face include scalability of their data management methods to many and/or very large data files, fully documenting data and its organization, and meeting requirements of grants/publication related to data sharing. This four-class course is designed to introduce attendees to best practices in data organization and management. Each one-hour lecture will include lecture, discussion, and practice exercises. This course assumes no prior training in data science. At the end of this course, you will be able to identify resources at Fred Hutch for data management and apply best practices in data organization to your own research projects.
Software requirements for this course can be found on fredhutch.io’s Software page.
Schedule
- Class 1: Data entry and creating spreadsheets
- Class 2: Organizing data and project files
- Class 3: Documenting data with metadata
- Class 4: Data manipulation and reproducibility
Resources
- Data Organization in Spreadsheets for Ecologists Copyright (c) Data Carpentry.
- Tidy Data by Hadley Wickham
- Spreadsheet Help from California Digital Libraries
- Primer on Data Management by Strasser et al.