Genome Assembly (2KV)

Course no.: 365.092
Lecturer: Alois Regl (alois.regl@regl.net, +43 664 4502030)
Times/locations: Fri 8:30-11:00, room tba
Start: Fri Oct 6, 2017
Mode: KV, 2h, weekly
Registration: KUSSS

Motivation and course outline

Sequencing DNA has become one of the major tools in Biology. There are literally dozens or hundreds of applications, besides the classical genome assembly task. The algorithmic complexity and the demand on computing resources is overwhelming.

In this lecture, you get a deeper insight into the genome assembly algorithms and you will get familiar with preparing reads (from the sequencing machine) and using existing software to assemble genomes, as well as doing differential expression analysis by using sequencing (RNA-seq). This is achieved by a combination of “demos” by me and small projects for you as homework.

Part one – HowTo’s:

I will present “close-to-real” workflows for quality control (using two different software products), for genome assembly (de novo and by using a reference sequence) and for differential expression analysis using RNA-seq. Having a good understanding of the principles of genome assembly (e.g. the corresponding chapter of GT) will help a lot.

Part two – Theory input:

I will present a few more details on the algorithmic background of assembly. This means that the chapter on assembly algorithms in GT ("Genomics and Transcriptomics") is a prerequisite to GA.
One of the chapters will be the error correction heuristics and integration of reads in EULER-SR

Part three – Projects:

The projects will be done in the form of homework. I will be available at JKU during most of the dates to answer questions etc., plus of course anytime via email.
Projects should be done in groups of two. You will form the groups as you like. If you want to do it alone, contact me.

Project A (Experiments with DBG)

You will write a few R pgms that crudely simulate sequencing (create reads from a given genome, including errors) and produce a DBG and an Overlap graph from those reads. Play around with parameters, visualize the DBG using Cytoscape.

Project B (genome assembly)

You will be given a data set containing reads from a full genome sequencing project of a small organism.
Your task is to assemble the genome in a few variants and to compare the results. Assembly will be done up to the level of contigs only, there will be no mate pairs. Focus is on discussion of the results.

Project C (RNA-seq, differential expression)

You will be given two sets of data sets containing reads from RNA-seq from several samples.
Your task is to find out about all the genes that can be considered as differentially expressed between the two groups

Exam

There will be a small exam. But you will get most of the points – around 60-80% - from the project(s).

Technical requirements

You will need a Linux installation to do the projects. The genomes used will be small, so Linux on a notebook or PC will be enough. Upon request, you can also use the Linux machine of the BI Institute.
A knowledge of R is required. At least one of the projects will have to be done in R.