Schedule 2021
Date | Location | Topic | Notes | Reading |
Week 1 | ||||
2021-08-31 8:00-10:00 |
Zoom | Introduction | [slides] [printable] [video] | The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
2021-09-01 13:00-15:00 |
Zoom | Storage (GFS, Flat FS) |
[slides] [printable] [video] | The Google File System [pdf]
Flat Datacenter Storage [pdf] |
Week 2 | ||||
2021-09-07 10:00-12:00 |
Zoom | Storage (BigTable, Cassandra, Neo4j) |
[slides] [printable] [video] | Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf] Graph Databases (Ch. 3, 6) Neo4j Documentation [link] |
2021-09-08 15:00-17:00 |
Zoom | Scala | [slides] [printable] [video] | Scala By Example [pdf] |
Week 3 | ||||
2021-09-13 10:00-12:00 |
Zoom | Parallel Data Processing (MapReduce, FlumeJava) |
[slides] [printable] [video] | MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf] Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
2021-09-14 13:00-15:00 |
Zoom | Parallel Data Processing (Spark) |
[slides] [printable] [video] | Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) Learning Spark (Ch. 1-2) Spark Documentation [link] |
2021-09-15 15:00-17:00 |
Zoom | Lab 1 | [slides] [video] | |
Week 4 | ||||
2021-09-20 8:00-10:00 |
Zoom | Structured Data Processing (Spark SQL) |
[slides] [printable] [video] | Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) Learning Spark (Ch. 3-6) Spark SQL Documentation [link] |
2021-09-21 13:00-15:00 |
Zoom | Stream Processing (Introduction, Kafka) |
[slides] [printable] [video] | Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link] A Survey on the Evolution of Stream Processing Systems [pdf] |
2021-09-22 15:00-17:00 |
Zoom | Lab2 | [notebook] [video] | |
Week 5 | ||||
2021-09-28 13:00-15:00 |
Zoom | Stream Processing (Spark Streaming, Beam) |
[slides] [printable] [video] | Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [pdf] MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] Spark - The Definitive Guide (Ch. 20-23) Learning Spark (Ch. 8) Spark Streaming Documentation [link] Beam Documentation [link] |
2021-09-29 15:00-17:00 |
Zoom | Graph Processing (Pregel, GraphLab, GraphX) |
[slides] [printable] [video] | Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] GraphX: Graph Processing in a Distributed Dataflow Framework [pdf] Spark - The Definitive Guide (Ch. 30) GraphX Documentation [link] |
Week 6 | ||||
2021-10-04 8:00-10:00 |
Zoom | Resource Management (Mesos, YARN, Borg, Kubernetes) |
[slides] [printable] [video] | Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] Large-Scale Cluster Management at Google with Borg [pdf] |
2021-10-06 15:00-17:00 |
Zoom | Cloud Data Lakes | [slides] [printable] [video] | Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores [pdf]
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [pdf] Learning Spark (Ch. 9) |
Week 7 | ||||
2021-10-13 15:00-17:00 |
Zoom | Guest Lecturer
Laleh Akbarynoor (Google) |
[slides] | |