Schedule
Date | Location | Topic | Notes | Reading |
Week 1 | ||||
2022-08-30 13:00-15:00 |
Sal-B | Introduction | [slides] [printable] [video 2021] |
The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
2022-08-31 13:00-15:00 |
Sal-B | Storage GFS |
[slides] [printable] [video 2021] [lab1] |
The Google File System [pdf] |
Week 2 | ||||
2022-09-06 13:00-15:00 |
Sal-B | Storage (BigTable, Cassandra, Neo4j) |
[slides] [printable] [video 2021] |
Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf] Graph Databases (Ch. 3, 6) Neo4j Documentation [link] |
2022-09-08 15:00-17:00 |
Sal-B | Scala | [slides] [printable] [video 2021] |
Scala By Example [pdf] |
Week 3 | ||||
2022-09-13 13:00-15:00 |
Sal-B | Parallel Data Processing (MapReduce, FlumeJava) |
[slides] [printable] [video 2021] |
MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf] Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
2022-09-15 15:00-17:00 |
Sal-B | Parallel Data Processing (Spark) |
[slides] [printable] [video 2021] [lab2] [lab2 src] |
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) Learning Spark (Ch. 1-2) Spark Documentation [link] |
2022-09-16 10:00-12:00 |
Sal-B | Lab 1 | [slides] | |
Week 4 | ||||
2022-09-20 13:00-15:00 |
Sal-B | Structured Data Processing (Spark SQL) |
[slides] [printable] [video 2021] |
Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) Learning Spark (Ch. 3-6) Spark SQL Documentation [link] |
2022-09-22 15:00-17:00 |
Sal-B | Stream Processing (Introduction, Kafka) |
[slides] [printable] [video 2021] [lab3] [lab3 src] |
Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link] A Survey on the Evolution of Stream Processing Systems [pdf] |
2022-09-23 10:00-12:00 |
Sal-B | Lab2 | [slides] | |
Week 5 | ||||
2022-09-27 13:00-15:00 |
Sal-B | Stream Processing (Spark Streaming, Beam) |
[slides] [printable] [src] [video 2021] |
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [pdf] MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] Spark - The Definitive Guide (Ch. 20-23) Learning Spark (Ch. 8) Spark Streaming Documentation [link] Beam Documentation [link] |
2022-09-29 15:00-17:00 |
Sal-B | Graph Processing (Pregel, GraphLab, GraphX) |
[slides] [printable] [video 2021] [lab4] [lab4 src] |
Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] GraphX: Graph Processing in a Distributed Dataflow Framework [pdf] Spark - The Definitive Guide (Ch. 30) GraphX Documentation [link] |
Week 6 | ||||
2022-10-04 13:00-15:00 |
Sal-B | Resource Management (Mesos, YARN, Borg, Kubernetes) |
[slides] [printable] [video 2021] |
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] Large-Scale Cluster Management at Google with Borg [pdf] |
2022-10-05 15:00-17:00 |
Sal-B | Cloud Data Lakes | [slides] [printable] [video 2021] |
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores [pdf]
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [pdf] Learning Spark (Ch. 9) |
Week 7 | ||||
2022-10-11 13:00-15:00 |
Sal-A | Mohammadhossein Andjedani Principal MLOps Engineer at King |
[slides] | |