Goals of the course: To get familiar with the fundamental algorithmic ideas and methods (string algorithms) that lay the foundation for, and are applicable to the exploitation of molecular sequence data (DNA, RNA, and protein).[Practical arrangements] [Material] [Lecture Notes] [Exercises] [Links] [Assumed background] [Course feedback]
The course consists of ...
The first lecture takes place on Wednesday 14, 2004, at 8.15-10 in class E16-E17. (Changes are possible: Please check the on-line schedules for possible updates.)A preliminary course syllabus is available here.
Language of instruction: If there is sufficient demand, the course will be given in English; say, if that is requested by more than one non-Finnish speaker actively participating in the course. Please contact the lecturer in advance if you would like the course to be given in English. Otherwise the course will be given in Finnish. (For communication, either English of Finnish can be used.)
The lectures and the exercises are given by professor Pekka Kilpeläinen.
Grading: The grade is determined by the formulaceiling(12*Exam + 4*Exerc - 4) ,where Exam is the fraction of given exam points out of the maximum, and Exerc is the fraction of solved exercise assignments out of the total. (Function ceiling(x) means rounding up to the closest integer that is at least x.) The re-take exam can be graded either based on this formula or on exam points only, which ever gives the better result. In any case, at least half of the exam points are required to pass the course. The lowest accepted grade is 3.Please notice that exercise activity forms 25 % of the grade. Actively solving homework assignments is central for learning to understand and to apply the techniques; Often the process of working on the assignments is much more important than the actual "solutions" that are discussed at the exercise sessions!
Copies of lecture slides and exercise assignments will be made available online.The course is based on the textbook Algorithms on Strings, Trees, and Sequences: computer science and computational biology by Dan Gusfield (Cambridge University Press, 1999). A pair of copies are available for short-term loans in the department library.
I try to deliver copies of slides here prior to the lectures. Please note that the slides are skeletons of lectures, that is, they are not written especially for self-study. If you cannot attend the lectures, you should be able to compensate it by studying the textbook on your own.
- Introduction (January 14) [Full-size PDF slides] * [Reduced PDF for printing]
- Naive string matching and fundamental preprocessing (January 15) [Full-size PDF slides] * [Reduced PDF for printing]
- Boyer-Moore matching (January 19) [Full-size PDF slides] * [Reduced PDF for printing] (with small corrections on Jan 19)
- Exact set matching and Aho-Corasick method (January 21) [Full-size PDF slides] * [Reduced PDF for printing] * [Reduced PS for printing]
- The Shift-And Method (January 26) [Full-size PDF slides] * [Reduced PDF for printing] (with small corrections on Jan 28)
- Introduction to Suffix Trees (January 28) [Full-size PDF slides] * [Reduced PDF for printing]
- Linear-time Construction of Suffix Trees (February 2 and 4) [Full-size PDF slides] * [Reduced PDF for printing]
- Applications of Suffix Trees (February 9 and 11) [Full-size PDF slides] * [Reduced PDF for printing]
- Core String Edits and Alignments (February 16) [Full-size PDF slides] * [Reduced PDF for printing]
- Local Alignments and Gaps (February 18 and 23) [Full-size PDF slides] * [Reduced PDF for printing]
- Introduction to Multiple Alignments (February 23) [Full-size PDF slides] * [Reduced PDF for printing]
- Computing Multiple Alignments (February 25 and March 1) [Full-size PDF slides] * [Reduced PDF for printing]
- Sequence Database Searching (March 1 and 3) [Full-size PDF slides]
- [Reduced PDF for printing] Part 1
- [Reduced PDF for printing] Part 2
Homework assignments for the exercise sessions will be posted here:Session 1 (January 27)
Session 2 (February 2)
Session 3 (February 9)
Session 4 (February 16)
Session 5 (February 23)
Session 6 (March 1)
Session 7 (March 8)
We'll be extensively involved with basic techniques of algorithm analysis and dynamic programming; the necessary concepts will be reviewed, but familiarity with those topics will certainly be useful. A course on Design and Analysis of Algorithms is recommended, and a course on Data Structures and Algorithms could be considered minimum required background.Knowledge of molecular biology is useful for appreciating the context and motivation of the methods studied on the course, but it is not necessary for understanding the algorithmic issues that form the focus of the course.
Only three students returned the course feedback form. A summary of their feedback is found here.