CMPUT 675: Algorithms for Streaming and Big Data

Fall 2019, MW 12:00-13:20, Location: CSC B41.

Course description and objectives

The focus of the course is in design and analysis of algorithms dealing with massive data, typically so large that it cannot fit into storage and hence the algorithm might have access to a stream of data. The constraints we typically have to work with is limited memory size compared to the massive data size. We will discuss algorithms for sampling and sketching, dimensionality reduction, sparsification, approximate query processing, etc. We will see several techniques along they way and the focus is on the design and analysis of the algorithms, rather than particular applications and how they will be used in practice.

Prerequisites

CMPUT 304 or strong undergraduate background in theoretical computer science, mathematics, and statistics.

This is a theory course (no programming involved).

• Scribe notes: 10%
• Assignments: 60%
• Project: 30%

Each student is expected to take notes for one or two lectures (depending on the number of participants). The notes should be typeset using the template provided below and submitted within 5 days after the lecture. In doing so you have to complete the steps in the proofs and provided details for parts that are sketched in the lecture. You submit your .tex files and all supporting files. This will be worth 10% of your grades.

There will be 3-4 sets of assignments (worth 60%).

The remaining 30% is for a course project. The project can be a survey on a topic related to the course (to be agreed upon mutually) or can be a presentation. See details of project below.

Homework assignments, scribe notes, as well as written projects must be prepared using LaTeX.

Lecture notes

• Lecture 1: Introdunction, Background, Approximate counting.
• Lecture 2: Frequency moments, number of distinct elements
• Lecture 3: Estimating frequency moments
• Lecture 4: Estimating F_2
• Lecture 5: JL Lemma and estimating F_p
• Lecture 6: F_p estimation
• Lecture 7: Countsketch and CountMin Sketch
• Lecture 8: Applications of countsketch,
• Lecture 9: Sparse recovery
• Lecture 10: weighted and priority sampling
• Lecture 11: l_0 and l_2 sampling
• Lecture 12: Discussing Assignment
• Lecture 13: Graph streams (connectivity, shortest path)
• Lecture 14: Matching, connectivity in dynamic model
• Lecture 15: Cuts, Sparsification, and connectivity
• Lecture 16: Selection
• Lecture 17; Streaming algorithm for geometric problems
• Lecture 18: k-center, matrix multiplication
• Lecture 19: Fast matrix multiplication (unedited)
• Lecture 20: Lower bounds, communication complexity
• Lecture 21: Lower bounds, Index, Gap hamming
• Lecture 22: project presentations
• Lecture 23: project presentations
• Lecture 24: project presentations

As a template for course notes, here is a source file for lecture 1.
Here is the algorithms.sty and picins.sty

Assignments

Course Project

For your course project you have two options. One option could be a written survey (up to 10 pages) of a few papers (one or two or three) related to the topics of the course, or it could be a presentation in one of the lectures again on a topic related to the course. The written projects are due Dec 4 and this is a firm deadline.
Presentations will be in the last 3 weeks of Nov. So if you pick to do a presentation you must select the topic
early on. In order to pick your project you must submit a proposal to me by Oct 23, stating what you plan to do, list of papers/topics you propose to work on and whether it will be a presentation or a survey. This can be a simple e-mail. We have to come to a mutual agreement on the project by Nov 4.

Course Policies

Please refer to the Department Course Policies for general information about course policies.

Assignments are due before the start of the lecture on the due dates. Late assignments up to 24 hours will be deducted 20% of full grade. After 24 hours no assignment will be accepted.

Scribed Lecture notes are due 5 days after the lecture. You need to send all source files (.tex and images if any).