Date Location Topic Notes Reading
Week 1
2019-08-27
15:00-17:00
Sal-B Introduction [slides] [printable] The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf]
A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf]
2019-08-30
15:00-17:00
Sal-B Storage
(GFS, HopsFS)
Guest Lecturer: Salman Niazi
GFS [slides] [printable]
HopsFS [slides]
The Google File System [pdf]
HopsFS: Scaling Hierarchical File System Metadata Using NewSQL [pdf]
Week 2
2019-09-03
15:00-17:00
Sal-B Storage
(BigTable, Cassandra, Neo4j)
[slides] [printable] Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf]
Graph Databases (Ch. 3, 6)
Neo4j Documentation [link]
2019-09-5
15:00-17:00
Sal-C Programming Languages
(Scala)
[slides] [printable] Scala By Example [pdf]
Week 3
2019-09-10
15:00-17:00
Sal-B Parallel Data Processing
(MapReduce, FlumeJava)
[slides] [printable] MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3)
2019-09-12
15:00-17:00
Sal-C Parallel Data Processing
(Spark)
[slides] [printable] Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14)
Spark Documentation [link]
Week 4
2019-09-17
15:00-17:00
Sal-B Structured Data Processing
(Spark SQL)
[slides] [printable] Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11)
Spark SQL Documentation [link]
2019-09-19
15:00-17:00
Sal-B Stream Processing
(Introduction, Kafka)
[slides] [printable] Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link]
Week 5
2019-09-24
15:00-17:00
Sal-B Stream Processing
(Flink)
Guest Lecturer: Paris Carbone
[slides]
Apache Flink: Stream and Batch Processing in a Single Engine [pdf]
State Management in Apache Flink [pdf]
Flink Documentation [link]
2019-09-26
15:00-17:00
Sal-B Stream Processing
(Spark Streaming, Beam)
[slides] [printable] Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf]
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf]
Spark - The Definitive Guide (Ch. 20-23)
Spark Streaming Documentation [link]
Beam Documentation [link]
Week 6
2019-10-01
15:00-17:00
Sal-B Graph Processing
(Pregel, GraphLab)
[slides] [printable] Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf]
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf]
2019-10-03
15:00-17:00
Sal-C Graph Processing
(X-Stream, GraphX)
[slides] [printable] X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
Spark - The Definitive Guide (Ch. 30)
GraphX Documentation [link]
Week 7
2019-10-08
15:00-17:00
Sal-C Resource Management
(Mesos, YARN)
[slides] [printable] Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf]