tfcb_2020

Lecture 15: Introduction to Sequencing Data Analysis

Now that we have a basic grasp of concepts surrounding data management, manipulation, and visualization, we’re ready to start focusing on some of the more specialized data encountered in computational biology research. Sequencing of nucleic acids is almost ubiquitous in biological research. In this lecture, we will introduce some common resources for depositing and retrieving sequence data generated by consortium efforts and independent laboratories. We will introduce concepts and practical steps of querying, inspecting, and visualizing sequence data. Then, we will cover the types of genomic variation and common tools used to predict these from sequencing data.

This lecture focuses on concepts surrounding genome sequence data and their associated workflows. This lecture will include demonstrations and student exercises. We will dive into details of sequencing data and formats, as well as outputs for specific sequencing analysis commands. There will also be materials included as a resource for your future reference.

Learning objectives

Identify common databases and file formats used for sequence data
Describe the steps involved in processing and analyzing sequence data to predict different types of genomic variants
Recognize common tools (databases and software) used to assess variation in genomic data

Class materials

Outline of content from the slides:

Sequence data
- Databases and online resources for sequence data
- Learn the common sequence data file formats
Tools for sequencing data
- Tools to query, inspect, visualize an aligned sequence file (demo + exercise)
- Learn the contents of sequence data files
- Learn to generate sequencing metrics and to process sequence data (demo + exercise)
Genome variant analysis
- Types of genomic variation
- Tools to predict genomic variations
- Learn the common file formats for variation data
- Databases and online resources for human variation data

For in-class exercises, data and examples shown in this lecture are available on Fred Hutch filesystem at /fh/fast/subramaniam_a/tfcb and on DropBox.

Visualization examples use the Integrative Genomics Viewer (IGV)

Reminders

The next class session (lecture 16) will include analysis of genomic data in R. To prepare for this session, please download all data files in this Dropbox folder and follow the instructions in this script to install required packages. You may also find it useful to install the Integrative Genomics Viewer (IGV) for visualization of genomic data.