Date Location Topic Notes Reading
Week 1
2018-08-28
15:00-17:00
Sal-B Introduction slides [pdf] The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf]
A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf]
2018-08-31
10:00-12:00
K-208 Storage
(GFS, Flat FS)
slides [pdf] The Google File System [pdf]
Flat Datacenter Storage [pdf]
Week 2
2018-09-03
15:00-17:00
Sal-B Storage
(Dyanmo, BigTable, Cassandra)
slides [pdf] Dynamo: Amazon's Highly Available Key-value Store [pdf]
Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf]
Week 3
2018-09-10
13:00-15:00
Sal-C Programming Languages
(Scala)
slides [pdf] Scala By Example [pdf]
2018-09-14
10:00-12:00
K-208 Parallel Data Processing
(MapReduce, FlumeJava)
slides [pdf] MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3)
Week 4
2018-09-18
10:00-12:00
Sal-C Parallel Data Processing
(Spark)
slides [pdf] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14)
Week 5
2018-09-24
15:00-17:00
Sal-C Structured Data Processing
(Spark SQL)
slides [pdf] [data] Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11)
2018-09-27
10:00-12:00
Sal-B Stream Processing
(Introduction, Kafka)
slides [pdf] High-Availability Algorithms for Distributed Stream Processing [pdf]
Kafka: a Distributed Messaging System for Log Processing [pdf]
Designing Data-Intensive Applications (Ch. 11)
Fundametals of Stream Processing (Ch. 1-5, 7, 9)
Week 6
2018-10-01
13:00-15:00
Sal-C Stream Processing
(Storm, Millwheel, Google Dataflow)
slides [pdf] [src] Storm @Twitter [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf]
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf]
2018-10-05
13:00-15:00
Sal-B Stream Processing
(Spark Streaming, Flink)
slides [pdf] [src] Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Apache Flink: Stream and Batch Processing in a Single Engine [pdf]
Spark - The Definitive Guide (Ch. 20-23)
Week 7
2018-10-08
15:00-17:00
Sal-B Graph Processing
(Pregel, GraphLab, X-Stream)
slides [pdf] Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf]
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf]
X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf]
2018-10-09
10:00-12:00
Sal-A Graph Processing
(GraphX, Giraph++, Pegasus)
slides [pdf] GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
From "Think Like a Vertex" to "Think Like a Graph" [pdf]
Pegasus: Mining Peta-Scale Graphs [pdf]
Spark - The Definitive Guide (Ch. 30)
2018-10-12
10:00-12:00
Sal-C Resource Management
(Mesos, YARN)
slides [pdf] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf]