Date Location Topic Notes Reading
Week 1
2020-08-26
13:00-15:00
Zoom Introduction [slides] [printable] The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf]
A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf]
2020-08-27
10:00-12:00
Zoom Storage
(GFS, Flat FS)
[slides] [printable] The Google File System [pdf]
Flat Datacenter Storage [pdf]
Week 2
2020-09-01
10:00-12:00
Zoom Storage
(BigTable, Cassandra, Neo4j)
[slides] [printable] Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf]
Graph Databases (Ch. 3, 6)
Neo4j Documentation [link]
2020-09-03
15:00-17:00
Zoom Programming Languages
(Scala)
[slides] [printable] Scala By Example [pdf]
Week 3
2020-09-08
15:00-17:00
Zoom Parallel Data Processing
(MapReduce, FlumeJava)
[slides] [printable] MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3)
2020-09-09
13:00-15:00
Zoom Lab1
2020-09-10
10:00-12:00
Zoom Parallel Data Processing
(Spark)
[slides] [printable] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14)
Learning Spark (Ch. 1-2)
Spark Documentation [link]
Week 4
2020-09-15
8:00-10:00
Zoom Structured Data Processing
(Spark SQL)
[slides] [printable] Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11)
Learning Spark (Ch. 3-6)
Spark SQL Documentation [link]
2020-09-16
15:00-17:00
Zoom Stream Processing
(Introduction, Kafka)
[slides] [printable] Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link]
A Survey on the Evolution of Stream Processing Systems [pdf]
2020-09-17
8:00-10:00
Zoom Lab2 [notebooks]
Week 5
2020-09-22
10:00-12:00
Zoom Stream Processing
(Spark Streaming, Beam)
[slides] [printable] Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf]
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf]
Spark - The Definitive Guide (Ch. 20-23)
Spark Streaming Documentation [link]
Beam Documentation [link]
2020-09-23
13:00-15:00
Zoom Graph Processing
(Pregel, GraphLab)
[slides] [printable] Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf]
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf]
Week 6
2020-09-30
10:00-12:00
Zoom Graph Processing
(X-Stream, GraphX)
[slides] [printable] X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
Spark - The Definitive Guide (Ch. 30)
GraphX Documentation [link]
2020-10-01
10:00-12:00
Zoom Resource Management
(Mesos, YARN)
[slides] [printable] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf]
Week 7
2020-10-06
10:00-12:00
Zoom Guest Lecturer
Hooman Peiro Sajjad, Spotify
[slides]