Comparison of other RNAseq course materials
Fredhutch.io’s Galaxy materials
A quick start guide to doing RNA-sequencing analysis in Galaxy. Covers Importing data through gene expression analysis.
Scope
- Walkthrough tutorial style
- Brief: <3 hrs
Outline
- FH Galaxy server login information
- Importing data to Galaxy
- Combining datasets in Galaxy
- Using UCSC to get a gene annotation
- Read mapping with TopHat
- Counting reads with htseq-count
- Differential gene expression analysis with DESeq2
Software
- Galaxy
- TopHat
- htseq-count
- DESeq2
Gavin Ha’s lectures and R labs for TFCB
Lecture materials from the UW Tools For Computational Biology course. Covers Bioconductor packages for working with genomic data, inspecting and quering genomica data, identifying and annotating genomic varients.
Scope
- R Markdown course materials
- Bioconductor tools to extract meaning from previously mapped files
Outline
- Genomic data analysis
- Using GenomicRanges to store and query genomic data
- Finding the overlap between two genomic sequences
- Sequence data analysis
- Loading and querying BAM files using Rsamtools
- Computing pile up statistics
- Read Variant Call Format (VCF) Files
- Read and extract contents of VCF
- Reading varients from VCF
Software
- R (Bioconductor)
- Rsamtools
- VariantAnnotation
- GenomicRanges
David Coffey’s RNAseq repository, for an authentic workflow
A series of shell and R scripts used to process RNA sequencing data
Scope
- GitHub repo with scripts and README
- Minimal guidance
Outline
- Downloading raw fastq files from the NCBI sequence read archive (http://www.ncbi.nlm.nih.gov/sra) or generating your own sequencing files.
- Alignment to a reference genome. Unaligned reads may then be aligned to alternative genomes such a pathogen genome.
- Merging (for multilane samples) and processing
- Run the resulting bam files can be run through a series of additional analyses such as GATK variant detection and STAR fusion gene detection.
- Quality control analyses may also be performed on fastq files using FastQC and bam files using RNAseQC.
Software
Amy P’s repository with code and documentation for Pathways/SHIP, for materials translatable to high school students
Scope
- workflow used with undergrad interns to analyze RNAseq data for variety of labs
Outline
- STAR two pass alignment
- RNASeQC
- post-processing in R for DGE
- import metadata and data
- assess gene data
- annotate gene names
- create counts matrix, phenotype matrix, SummarizedExperiment object
- DGE
Software
- STAR, RNASeQC
- Tidyverse and DESeq2
Alex’s Lemonade Stand RNAseq materials
A single module in a series from The Alex’s Lemonade Stand Foundation Childhood Cancer Data Lab
Scope
- According to the schedule modules take two days
- Lots of good information in this organization’s repos related to R and RNA seq but it’s not well documented where things live + broken links make the repos difficult to navigate.
Outline
- Installing and setting up a Docker container
- Accessing data on flash drives
- Intro to R and intermediate R (Tidyverse)
- QC, trim, and quantification using Salmon
- Gene level summary using tximport
- RNA-seq EDA
- Differential gene expression analysis
- Normalizing count matrix
- Single cell - processing 10x raw data
- Single cell - dimensionality reduction
- Machine learning - data prep, cclustering, PLIER
Software
- Bulk RNA Seq
- FastQC
- fastp
- Salmon
- tximport
- DESeq2
Cornell RNAseq course
RNA Seq analysis workshop course materials.
Scope
- According to website the workshop took 4 days
- Includes slides, sample agignment files/read counts/outputs, extensive course notes
Outline
- Set up on the command line - create directory structure, download fastq
- QC raw reads w FastQC
- Alignment with STAR
- Interacting with BAM/SAM files using samtools
- Visual inspection with IGV
- Read in feature counts to R
- Use DESeq2 to normalize read counts for differences in seq depth and transform reads to the log2 scale.
- Differential gene analysis with DESeq2
- GO term enrichment
Software
- DESeq2
- ClusterProfiler
Griffith Lab RNAseq course
An in depth course covering all aspects of RNA-seq analysis.
Scope
- A course made up of multiple modules
- According to the schedule takes 5 days
Outline
- Course set up (aws, unix, tool installation)
- Intro to RNA seq theory
- General goals/themes in RNA seq analysis workflow
- Intro to BAM/SAM formats
- Visualizatio of alignment in IGV
- BAM read counting
- Expression estimation for known genes and transcripts
- Differential Expression analysis
- Downstream interpretation of expression
- Alignment free estimation of expression with Kallisto/Sleuth
- Isoform discovery w StringTie
- Differential splicing analysis with Ballgown
- Examine and visualize junction counts
- DeNovo assembly with Trinity
- Transcript annotation with Trinotate
- ScRNAseq applications/advantages/challenges
- 10x/CellRanger overview
- Custom scRNAseq analysis in R
Software
- EC2, unix commands for cloud computing (more info here)
- SAMtools, bam-readcount, HISAT2, StringTie, gffcompare, htseq-count, TopHat,kallisto, FastQC, MultiQC, Picard, Flexbar, Regtools, RSeQC, bedops, gtfToGenePred, genePredToBed, how_are_we_stranded_here, R, tidyverse, Bioconductor, Sleuth (more info here)
Harvard
They have a series of RNAseq classes offered, using various approaches and infrastructure. The synopsis here includes:
- https://github.com/hbctraining/rnaseq_overview
- https://hbctraining.github.io/Intro-to-rnaseq-hpc-salmon-flipped/ (most recent)
Scope
- overview is conceptual, ~5 hours
- HPC is skills-based, 7.5 hours of instructor-led with substantial prep for participants, including homework to submit
Outline
Overview:
- library prep
- sequencing steps and sequences
- experimental planning considerations
- strategies for bulk-RNAseq analysis
- data management
- raw data QC
- mapping/quantification
- sample-level assessment
- count modeling and hypothesis testing
- visualization of results
- functional analysis
HPC:
- working in HPC
- Project organization and data management
- quality control of data
- sequence alignment
- alignment-free methods
- troubleshooting RNAseq data analysis
- automating the RNAseq workflow
Other materials:
- Intro to R: https://hbctraining.github.io/Intro-to-R-flipped/#lessons
- Intro to DGE: https://hbctraining.github.io/DGE_workshop_salmon_online/#lessons
Software
Overview: none
HPC:
- FileZilla, text editor, gitbash
- uses on-premise compute
- fastqc, slurm, salmon, MultiQC, bash, R, DGE
nf-core
Scope
Nextflow pipeline
Outline and software
This was copy and pasted from outline:
- Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet (ENA FTP; if required)
- Merge re-sequenced FastQ files (cat)
- Read QC (FastQC)
- UMI extraction (UMI-tools)
- Adapter and quality trimming (Trim Galore!)
- Removal of ribosomal RNA (SortMeRNA)
- Choice of multiple alignment and quantification routes:
- STAR -> Salmon
- STAR -> RSEM
- HiSAT2 -> NO QUANTIFICATION
- Sort and index alignments (SAMtools)
- UMI-based deduplication (UMI-tools)
- Duplicate read marking (picard MarkDuplicates)
- Transcript assembly and quantification (StringTie)
- Create bigWig coverage files (BEDTools, bedGraphToBigWig)
- Extensive quality control:
- RSeQC
- Qualimap
- dupRadar
- Preseq
- DESeq2
- Pseudo-alignment and quantification (Salmon; optional)
- Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)
RNAseq 123
Scope
DGE with Bioconductor
Outline
- 4 Data packaging
- 4.1 Reading in count-data -4.2 Organising sample information -4.3 Organising gene annotations
- 5 Data pre-processing -5.1 Transformations from the raw-scale -5.2 Removing genes that are lowly expressed -5.3 Normalising gene expression distributions -5.4 Unsupervised clustering of samples
- 6 Differential expression analysis -6.1 Creating a design matrix and contrasts -6.2 Removing heteroscedascity from count data -6.3 Fitting linear models for comparisons of interest -6.4 Examining the number of DE genes -6.5 Examining individual DE genes from top to bottom -6.6 Useful graphical representations of differential expression results
- 7 Gene set testing with camera