CMPUT 675: Algorithms for
Streaming and Big Data

Fall 2019, MW 12:00-13:20, Location: CSC
B41.

Instructor: Mohammad
R. Salavatipour

Course description and
objectives

The focus of the course is in design and analysis
of algorithms dealing with massive data, typically so large that
it cannot fit into storage and hence the algorithm might have
access to a stream of data. The constraints we typically have to
work with is limited memory size compared to the massive data
size. We will discuss algorithms for sampling and sketching,
dimensionality reduction, sparsification, approximate query
processing, etc. We will see several techniques along they way and
the focus is on the design and analysis of the algorithms, rather
than particular applications and how they will be used in
practice.

Prerequisites

CMPUT 304 or strong undergraduate background in
theoretical computer science, mathematics, and statistics.

Grading Scheme

This is a theory course (no programming involved).

- Scribe notes: 10%
- Assignments: 60%
- Project: 30%

Each student is expected to take notes for one or two lectures
(depending on the number of participants). The notes should be
typeset using the template provided below and submitted within 5
days after the lecture. In doing so you have to complete the steps
in the proofs and provided details for parts that are sketched in
the lecture. You submit your .tex files and all supporting files.
This will be worth 10% of your grades.

There will be 3-4 sets of assignments (worth 60%).

The remaining 30% is for a course project. The project can be a
survey on a topic related to the course (to be agreed upon
mutually) or can be a presentation. See details of project below.

**Homework assignments, scribe notes, as well as written
projects must be prepared using LaTeX. **

Lecture notes

- Lecture 1: Introdunction, Background, Approximate counting.
- Lecture 2: Frequency moments, number of distinct elements
- Lecture 3: Estimating frequency moments
- Lecture 4: Estimating F_2
- Lecture 5: JL Lemma and estimating F_p
- Lecture 6: F_p estimation
- Lecture 7: Countsketch and CountMin Sketch
- Lecture 8: Applications of countsketch,
- Lecture 9: Sparse recovery
- Lecture 10: weighted and priority
sampling

- Lecture 11: l_0 and l_2 sampling
- Lecture 12: Discussing Assignment

- Lecture 13: Graph streams (connectivity, shortest path)
- Lecture 14: Matching, connectivity in dynamic model
- Lecture 15: Cuts, Sparsification, and connectivity
- Lecture 16: Selection

- Lecture 17; Streaming algorithm for geometric problems
- Lecture 18: k-center, matrix multiplication
- Lecture 19: Fast matrix multiplication (unedited)
- Lecture 20: Lower bounds, communication complexity

- Lecture 21: Lower bounds, Index, Gap hamming

- Lecture 22: project presentations

- Lecture 23: project presentations

- Lecture 24: project presentations

As a template for course notes, here is a source
file for lecture 1.

Here is the algorithms.sty and picins.sty

Assignments

- Homework 1 (due Oct 16).
- Homework 2 (due Nov 27).

Course Project

For your course project you have two options. One option could be
a written survey (up to 10 pages) of a few papers (one or two or
three) related to the topics of the course, or it could be a
presentation in one of the lectures again on a topic related to
the course. The written projects are due **Dec 4** and this is
a firm deadline.

Presentations will be in the last 3 weeks of Nov. So if you pick
to do a presentation you must select the topic

early on. In order to pick your project you must submit a proposal
to me by **Oct 23,** stating what you plan to do, list of
papers/topics you propose to work on and whether it will be a
presentation or a survey. This can be a simple e-mail. We have to
come to a mutual agreement on the project by **Nov 4**.

Here is more information about project
topics.

Course Policies

Please refer to the Department
Course Policies for general information about course
policies.

Assignments are due before the start of the lecture on the due
dates. Late assignments up to 24 hours will be deducted 20% of
full grade. After 24 hours no assignment will be accepted.

Scribed Lecture notes are due 5 days after the lecture. You need
to send all source files (.tex and images if any).

Text Book? Useful Links:

There is no text book for this course. Here are some useful links
to similar courses and other resources:

- Algorithms for Big Data, by Chandra Chekuri (UIUC)
- Data Stream Algorithms, by Amit Chakrabarti (Dartmouth)
- Algorithmic Techniques for Big Data: Moses Charikar (Stanford)
- Randomized Algorithms for Matrices and Data, by Michael Mahoney (Stanford)
- Algorithmic Techniques for Big Data Analysis, by Barna Saha (U. Minnesota)
- Sub-linear Algorithms, Piotr Indyk and Ronitt Rubenfeld (MIT)
- Sketching Algorithms for Big Data, by Jelani Nelson (Harvard)
- Algorithms for Modern Data Models, by Ashish Goel (Stanford)
- Algorithms for Big Data: David Woodruff (CMU)
- Foundations of Data Science, a book by Avrim Blum, John Hopcroft and Ravi Kannan
- Dealing with Massive Data by Sergei Vassilvitskii (Google Research, Columbia University)
- Data Stream Algorithms by Andrew Mcgregor (UMASS)
- Algorithms for Big Data Management by Ashwin Machanavajjhala (Duke)
- A book in preparation on data stream algorithms by Andrew McGregor and Muthu Muthukrishnan

Questions? Send email to me ...