Date Location Topic Notes Reading
Week 1
2021-08-31
8:00-10:00
Zoom Introduction [slides] [printable] [video] The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf]
A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf]
2021-09-01
13:00-15:00
Zoom Storage
(GFS, Flat FS)
[slides] [printable] [video] The Google File System [pdf]
Flat Datacenter Storage [pdf]
Week 2
2021-09-07
10:00-12:00
Zoom Storage
(BigTable, Cassandra, Neo4j)
[slides] [printable] [video] Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf]
Graph Databases (Ch. 3, 6)
Neo4j Documentation [link]
2021-09-08
15:00-17:00
Zoom Scala [slides] [printable] [video] Scala By Example [pdf]
Week 3
2021-09-13
10:00-12:00
Zoom Parallel Data Processing
(MapReduce, FlumeJava)
[slides] [printable] [video] MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3)
2021-09-14
13:00-15:00
Zoom Parallel Data Processing
(Spark)
[slides] [printable] [video] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14)
Learning Spark (Ch. 1-2)
Spark Documentation [link]
2021-09-15
15:00-17:00
Zoom Lab 1 [slides] [video]
Week 4
2021-09-20
8:00-10:00
Zoom Structured Data Processing
(Spark SQL)
[slides] [printable] [video] Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11)
Learning Spark (Ch. 3-6)
Spark SQL Documentation [link]
2021-09-21
13:00-15:00
Zoom Stream Processing
(Introduction, Kafka)
[slides] [printable] [video] Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link]
A Survey on the Evolution of Stream Processing Systems [pdf]
2021-09-22
15:00-17:00
Zoom Lab2 [notebook] [video]
Week 5
2021-09-28
13:00-15:00
Zoom Stream Processing
(Spark Streaming, Beam)
[slides] [printable] [video] Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf]
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf]
Spark - The Definitive Guide (Ch. 20-23)
Learning Spark (Ch. 8)
Spark Streaming Documentation [link]
Beam Documentation [link]
2021-09-29
15:00-17:00
Zoom Graph Processing
(Pregel, GraphLab, GraphX)
[slides] [printable] [video] Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf]
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
Spark - The Definitive Guide (Ch. 30)
GraphX Documentation [link]
Week 6
2021-10-04
8:00-10:00
Zoom Resource Management
(Mesos, YARN, Borg, Kubernetes)
[slides] [printable] [video] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf]
Large-Scale Cluster Management at Google with Borg [pdf]
2021-10-06
15:00-17:00
Zoom Cloud Data Lakes [slides] [printable] [video] Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores [pdf]
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [pdf]
Learning Spark (Ch. 9)
Week 7
2021-10-13
15:00-17:00
Zoom Guest Lecturer
Laleh Akbarynoor (Google)
[slides]