CS 598: Communication Cost Analysis of Algorithms
Instructor: Edgar Solomonik

Mon/Wed 9:30-10:45, 1214 Siebel
4310 Siebel Center


Lec1 Introduction, motivation, course administration; collective communication protocols src
Lec2 Optimal packet-based broadcast, communication cost models (LogP, LogGP, BSP) src
Lec3 Communication avoiding algorithms for matrix multiplication src
Lec4 Communication avoiding algorithms for LU factorization src
notes on rectangular matrix multiplication
Lec5 Memory- and communication-efficient LU factorization src video 1 2
Lec6 Communication avoiding algorithms for LU with pivoting and QR intro src video 2 (QR)
Lec7 Parallel algorithms for QR factorization src video 1 2
Lec8 Communication avoiding algorithms for rectangular QR src video 1 2 3
Lec9 Ideal cache model and the discrete Fourier transform src video 1 2 3
Lec10 Parallel FFT and intro to communication lower bounds src video 1 2 3
Lec11 Cache-efficient and parallel sorting src video 1 2 3
Lec12 Bitonic sort and single-source shortest path graph algorithms src video 1 2 3
Lec13 All-pairs shortest-paths src video 1 2 3
Lec14 Betweenness centrality and sample sort revisited src video 1 2 3
Lec15 Pipelined parallel sorting, PRAM, tree contraction src video 1 2 3
Lec16 Tree contraction, Euler tour, list ranking, connectivity and MST src video 1 2 3
Lec17 Sparse direct methods, intro to iterative methods src video 1 2 3
Lec18 Avoiding communication in iterative solvers for sparse linear systems src video 1 2
Lec19 Preconditioning, incomplete LU src video 1 2 3
Lec20 Parallel preconditioning, domain decomposition, graph partitioning src video 1 2
Lec21 Approximate low-rank dense matrix and tensor factorizations src video 1 2
Lec22 Randomized algorithms for low-rank matrix factorization and least-squares src video 2
Lec23 Matrix and tensor completion (ALS, SGD, CCD) src video 1 2 3
Lec24 Molecular dynamics src video 1 2 3
Lec25 Fast integral equation methods, hierarchically structured matrices src video 1
Lec26 Hierarchically semi-separable (HSS) matrices src video 1 2 3
Lec27 HSS matrix construction, electronic structure calculations src video 1 2
Lec28 Convolutional neural networks src video 1 2
Lec29 Network interconnect topologies and topology-aware algorithms src video 1 2 3
Note: videos are incomplete and low quality, recommended only as a secondary resource for lecture slides

Brief Course Description

Efficiency and parallel scalability of data-intensive applications are most often constrained by data movement in the memory hierarchy and the network. This course will focus on analysis of algorithms through the lens of communication and synchronization models. Communication lower bounds and algorithms that attain them will be surveyed for fundamental combinatorial and numerical problems. The course will emphasize general analytical techniques, but will also connect to full-scale applications.


algorithms, linear algebra, and basic parallel programming (e.g. CS 473, 420, 450)

Related Courses

Here is a short list of closely relevant courses of which I am aware, with a theoretical focus and available online material. Most of them cover some subset of the topics/material presented in this course plus topics we will not cover. Predominantly they also have a different focus.
Michael Heath: Parallel Numerical Algorithms, 2015
Guy Blelloch: Parallel Algorithms, 2009
James Demmel and Oded Schwarz: Communication-Avoiding Algorithms, 2011
James Demmel: Applications of Parallel Computers, 2015 (other years available)
Satish Rao: Foundations of Parallel and Distributed Systems, 2012
Pavel Tvrdik: Topics in Parallel Computing, 1999


There is no single book covering the course material.

Yousef Saad's Iterative Methods for Sparse Linear Systems formed the basis of the preconditioning lectures and has much more information on stability and convergence of the methods.
James Demmel's Applied Numerical Linear Algebra is a great reference for information on matrix factorizations and other topics. The presentation of FFT given in the course was partially based on this book.
Joseph JaJa's Introduction to Parallel Algorithms covers PRAM complexity including list ranking and tree-based algorithms presented in the course.


The following list gives publications that include material covered in the course or may be useful sources of additional information.

Communication Cost Models and Cache Efficient Algorithms

Collective Communication

Matrix Multiplication

Dense Matrix Factorizations


Shortest-Path Computation

Tree-Based Graph Algorithms

Sparse Iterative Methods

Low-Rank Matrix Factorization and Randomized Linear Algebra

Molecular Dynamics

Fast Methods for Integral Equations

Electronic Structure Calculations

Tensor Computations

Network Topologies

Disclaimers: The above list of references is not a complete assortment of important publications on these topics, but rather the ones that will be touched upon in the course. Some papers are relevant to multiple sections, but all appear only once. Citations obtained via google scholar and may contain errors.