Date Location Topic Notes Reading
Week 1
2023-08-29
8:00-10:00
Sal-B Introduction [slides] [printable] The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf]
A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf]
2023-09-01
13:00-15:00
Sal-A Storage
GFS
[slides] [printable]
The Google File System [pdf]
Week 2
2023-09-05
13:00-15:00
Sal-B Storage
(BigTable, Cassandra, Neo4j)
[slides] [printable] Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf]
Graph Databases (Ch. 3, 6)
Neo4j Documentation [link]
2023-09-06
13:00-15:00
Sal-A Scala [slides] [printable] Scala By Example [pdf]
Week 3
2023-09-11
15:00-17:00
Sal-B Parallel Data Processing
(MapReduce)
[slides] [printable] MapReduce Simplifed Data Processing on Large Clusters [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3)
2023-09-12
8:00-10:00
Sal-B Parallel Data Processing
(Spark)
[slides] [printable]
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14)
Learning Spark (Ch. 1-2)
Spark Documentation [link]
2023-09-12
13:00-15:00
Sal-B Lab 1
Week 4
2023-09-18
15:00-17:00
Sal-B Structured Data Processing
(Spark SQL)
[slides] [printable] Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11)
Learning Spark (Ch. 3-6)
Spark SQL Documentation [link]
2023-09-20
10:00-12:00
Sal-B Stream Processing
(Introduction, Kafka)
[slides] [printable]
Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link]
A Survey on the Evolution of Stream Processing Systems [pdf]
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf]
2023-09-22
13:00-15:00
Sal-B Lab2
Week 5
2023-09-25
15:00-17:00
Sal-B Stream Processing
(Spark Streaming)
[slides] [printable]
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [pdf]
Spark - The Definitive Guide (Ch. 20-23)
Learning Spark (Ch. 8)
Spark Streaming Documentation [link]
2023-09-29
13:00-15:00
Sal-B Graph Processing
(Pregel, GraphLab, GraphX)
[slides] [printable]
Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf]
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
Spark - The Definitive Guide (Ch. 30)
GraphX Documentation [link]
Week 6
2023-10-02
15:00-17:00
Sal-B Resource Management
(Mesos, YARN, Borg, Kubernetes)
[slides] [printable]
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf]
Large-Scale Cluster Management at Google with Borg [pdf]
2023-10-03
10:00-12:00
Sal-B Cloud Data Lakes [slides] [printable]
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores [pdf]
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [pdf]
Learning Spark (Ch. 9)
Week 7
2023-10-09
15:00-17:00
Sal-B Mohammadhossein Andjedani
From bytes to insights, when Machine Learning meets a tsunami of data