Home Page  • Contact email

Learn Bioinformatics in 100 hours

Your progress in the course: 0%

Important note

This course is being replaced by the course

Please follow the course above for the most up to date content

Course information

The course represents all the training materials for the BMMB:852 Applied Bioinformatics course offered at Penn State in 2017.

The course offers a structured path through the Biostar Handbook. Various sections of the book are presented via smaller, logical consistent units. We recommend learning two-four units per week.

The lectures consist of slides, links to various chapters, links to supporting materials and homework. There are no videos.

Please consult the synopsis for details on what is covered and how to learn the materials.

Note: This book follows the 1st edition of the Handbook and will not match the content of the 2nd Edtion. There may be links and content that refer to sections that have been moved. For up to date content see Applied Bioinformatics (2020)

Lecture Your Score
Lecture 1: How is Bioinformatics practiced?

Course structure. How is bioinformatics practiced. Computer setup.

Lecture 2: How do I use the command line?

Unix command line use. Find help on commands. Flag system.

Lecture 3: How are Unix commands used for data analysis?

Examples of processing biological data from the command line.

Lecture 4: What do the words mean?

How to make sense of terminology. Sequence and gene ontologies.

Lecture 5: How to interpret a list of genes?

Functional enrichment, functional over-representation.

Lecture 6: How to access published data from the command line

Reproducibility. Data repositories. Entrez Direct

Lecture 7: Data formats. Genbank, FASTA and FASTQ

Accessing and manipulating sequencing data.

Lecture 8: Quality control of high throughput sequencing data

Quality visualization. Improving data quality. Adapter removal.

Lecture 9: Advanced quality control of FASTQ data

Sequence duplication, read merging, MultiQC, error correction.

Lecture 10: Sequencing concepts, methods, coverage formula

Single end and paired-end sequencing, computing sequencing depth

Lecture 11: Scripting and Automation

Automating tasks. Make analyses reproducible.

Lecture 12: Accessing the Short Read Archive

Short read archive, fastq-dump, repeating commands

Lecture 13: Sequence Alignments

Alignment scoring, global, local alignments

Lecture 14: BLAST, Basic Local Alignment Search Tool

Using blast online and at the command line

Lecture 15: BLAST databases

Make blast databases. BLAST search tasks.

Lecture 16: Short Read Aligners

What is short read alignment. How to run bwa and bowtie2.

Lecture 17: Sequence Alignment Maps (SAM)

SAM/BAM the workhorse of high throughput sequencing

Lecture 18: Paired end reads in BAM files.

Create and filter BAM files.

Lecture 20: Visualizing Large Genomic Variation

Large insertions, deletions, copy number variations

Lecture 21: Filtering SAM files

Select alignments by their attributes

Lecture 22: Processing SAM/BAM files

Picard tools. Unaligned BAM files.

Lecture 23: Short Genomic Variations

First steps in detecting short variations

Lecture 24: Let's call some SNPs

SNP calling with bcftools and freebayes

Lecture 25: The Variant Call Format

Understand the VCF format.

Lecture 26: Making sense of variants

variant effect prediction, interval datatypes, BED, GFF

Lecture 27: Sequencing Application Domains

Re-sequencing, assembly, classification

Lecture 28: Quantifying with sequencing

Functional assays, computing coverages over intervals

Course Synopsis: How does this course work?

What is the structure and purpose of this course.