uci > ics > franz > teaching > winter 2017 > cs 241

CS 241 — Advanced Compiler Construction, Winter 2017

Course Goals

This is a graduate course on advanced compiler construction.

Two developments are driving the need for new techniques of programming language implementation:

  • New hardware architectures (such as RISC and VLIW) require more complex optimizations than their predecessors. We discuss essential optimization techniques such as redundancy elimination, register allocation, and instruction scheduling.
  • There is a continual evolution of programming languages (Java, C#, JavaScript, Python, Scala, Cilk, OCaml…) providing novel features that need to be implemented efficiently.  We examine advanced language-implementation topics such as type-directed dispatch, garbage collection, and just-in-time code generation.

These topics will be introduced in class alongside a programming project, in which students have the opportunity to apply their new knowledge in practice.

Additionally, this course will illuminate how compilers increasingly include support for features that are related to software security, even though these often come at the expense of program performance.  We will discuss various kinds of low-level programming errors, how they are exploited by attackers, and what a compiler can do to prevent this from happening.

Prerequisites

No specific knowledge is assumed for this course.  In the first week, we will briefly cover all prerequisites that are typically taught in an undergraduate compiler course.  However, all students in this course are expected to be expert programmers in Java, C++ or similar languages.  If you haven't programmed in Java before, you will be expected to learn this quickly on your own during the first three weeks of the quarter.

Logistics

  • Class meets Monday and Wednesday 5pm – 7.50pm in ICS 180.  Note that the class session is twice as long as would be considered “normal” for a 4-unit class.  This is because the class incorporates a large project for which a lot of material needs to be taught up front.  Hence, we will have “double classes” followed by class sessions that won’t be used but during which you will be expected to be implementing.  The total number of contact hours will be the same as if this had been a regular 4-unit class.
  • “Double” 3 hour Classes will be given on on January 9th, 11th, 18th, 23rd, 25th, and 30th, February 1st, 6th, 9th, and 13th, and March 6th.
  • A written half-way project report is due on February 10th (the end of Week 5).  Please state on a single page what you have done so far and which of the milestones of the compiler project (Step 1 – Step 5) you have completed.
  • Finished projects must be presented during Week 10 (March 13-17). You should pace yourself to complete the project by then.  If you do not finish by this date, you may receive a grading penalty.
  • Please read your email! All email sent to your UCI email account will be considered read after 3 working days.

Topical Outline

  • everything an undergraduate ever needs to know about compilers (a crash repeat course in the first week)
  • intermediate representations, control flow analysis, dataflow analysis
  • value numbering and the Static Single Assignment form
  • redundancy elimination
  • register allocation
  • instruction scheduling
  • loop optimization and procedure optimization
  • type-bound dispatch, run-time typing, garbage collection, functional languages
  • optimizing object-oriented and parallel constructs (overview)
  • low-level programming errors and how these are exploited by attackers: buffer overflows, Return-Oriented-Programming (ROP) and its relatives (COOP, JIT-ROP, etc.), ...
  • compiler techniques for software security: stack canaries, ASLR, control-flow integrity, ...

Grading and Course Requirements

In a substantial programming project, students will apply their new knowledge by constructing essential parts of an optimizing compiler. Students may elect to work in pairs. The class grade will depend on how well you do in this project. I will provide several "public" test programs as well as a few "secret" ones that you don't get to see until you meet with me for the project review.

  • You will receive at least an A if your compiler correctly compiles all of the public test programs as well as my secret test programs.
  • You will receive at least an A- if your compiler correctly compiles all of the public test programs.
  • You will receive at least a B if you can demonstrate that you have invested "reasonable effort" into the project, even if your compiler doesn't fully run because of a persistent debugging problem. The instructor has sole discretion as to what constitutes "reasonable effort".

There will also be an opportunity to give a short presentation on a paper related to the class.  Students who volunteer for this will receive extra credit.  There won’t be enough papers to accommodate all students, so if you want to do this, volunteer early.

Comprehensive Examination

For those students who wish to take a Comprehensive Exam for this class, there will be a written examination on Wednesday, Mar 22nd, 10.30am-12.30pm in the regular classroom. This is the time slot that the Registrar has scheduled for a final exam.

Note that the grade for the comprehensive exam will be completely separate from your letter grade for this class.

Handouts

Textbook

This class does not use a textbook, and buying one of the many available textbooks on compiler construction is not likely to help you much. Instead, come to the class well rested and alert and simply try to absorb the material as it is being presented in class.

Research Papers To Read

Dominator Trees

efficient generation of the dominator tree for a given control flow graph
T. Lengauer and R. E. Tarjan; "A Fast Algorithm for Finding Dominators in a Flowgraph"; ACM Transactions on Programming Languages and Systems, 1:1, 121-141; 1979.

Static Single Assignment Form

an early paper documenting IBM’s PL.8 compiler (no mention of SSA yet)
M. Auslander and M. Hopkins; "An Overview of the PL.8 Compiler"; Proceedings of the ACM SIGPLAN '82 Symposium on Compiler Construction, Boston, Massachusetts, published as Sigplan Notices, 17:6, 22-31; 1982.

first mention of SSA in the literature (two papers published simultaneously)
B. Alpern, M. N. Wegman and F. K. Zadeck; "Detecting Equality of Variables in Programs"; Proceedings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, San Diego, California, 1-11; 1988.

B. K. Rosen, M. N. Wegman and F. K. Zadeck; "Global Value Numbers and Redundant Computations"; Proceedings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, San Diego, California, 12-27; 1988.

efficient generation of SSA
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman and F. K. Zadeck; "Efficiently Computing Static Single Assignment Form and the Control Dependence Graph"; ACM Transactions on Programming Languages and Systems, 13:4, 451-490; 1991.

M. M. Brandis and H. Mössenböck; "Single-Pass Generation of Static Single-Assignment Form for Structured Languages"; ACM Transactions on Programming Languages and Systems, 16:6, 1684-1698; 1994.

Register Allocation

Chaitin's original algorithm:
G.J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins and P. W. Markstein; "Register Allocation Via Coloring"; Computer Languages, 6:1, pp. 47-57; 1981.

but this is easier to track down:
G. J. Chaitin; "Register Allocation and Spilling Via Graph Coloring"; Proceedings of ACM SIGPLAN Symposium on Compiler Construction, published as SIGPLAN Notices, 17:6, pp. 98-105; 1982.

the 'other' graph coloring algorithm, rival to Chaitin:
F. C. Chow and J. Hennessy; "The Priority-Based-Coloring Approach to Register Allocation"; ACM Transactions on Programming Languages and Systems, 12:4, pp. 501-536; 1990.

cost functions:
T. A. Proebsting and C. N. Fischer; "Demand-Driven Register Allocation"; ACM Transactions on Programming Languages and Systems, 18:6, pp.683-710; 1996.

almost classic now:
P. Briggs, K. D. Cooper, L. Torczon; "Improvements to Graph Coloring Register Allocation"; ACM Transactions on Programming Languages and Systems, 16: 3, pp. 428-455; 1994.

for a later improvement to Briggs’ algorithm, as well as an SSA based description of it:
L. George and A. Appel; "Iterated Register Coalescing"; ACM Transactions on Programming Languages and Systems, 18:3, pp. 300-324; 1996.

linear-scan register allocation:
O. Traub, G. Holloway, and M. D. Smith; "Quality and Speed in Linear Scan Register Allocation"; Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI 1998), pp. 142-151; 1998.

M. Poletto and V. Sarkar; "Linear Scan Register Allocation"; ACM Transactions on Programming Languages and Systems, 21:5, pp. 895-913; 1999.

register allocation via the control-flow hierarchy:
D. Callahan and B. Koblenz; "Register Allocation via Hierarchical Graph Coloring"; Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation (PLDI 1991), pp. 192ff; 1991.

C. Norris and L. L. Pollock; "Register Allocation over the Program Dependence Graph"; Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI 1994), pp. 266ff; 1994.

last update: 6th January 2017 - franz@uci.edu