Schedule
Date | Location | Topic | Notes | Reading |
Week 1 | ||||
2020-08-26 13:00-15:00 |
Zoom | Introduction | [slides] [printable] | The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
2020-08-27 10:00-12:00 |
Zoom | Storage (GFS, Flat FS) |
[slides] [printable] | The Google File System [pdf]
Flat Datacenter Storage [pdf] |
Week 2 | ||||
2020-09-01 10:00-12:00 |
Zoom | Storage (BigTable, Cassandra, Neo4j) |
[slides] [printable] | Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf] Graph Databases (Ch. 3, 6) Neo4j Documentation [link] |
2020-09-03 15:00-17:00 |
Zoom | Programming Languages (Scala) |
[slides] [printable] | Scala By Example [pdf] |
Week 3 | ||||
2020-09-08 15:00-17:00 |
Zoom | Parallel Data Processing (MapReduce, FlumeJava) |
[slides] [printable] | MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf] Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
2020-09-09 13:00-15:00 |
Zoom | Lab1 | ||
2020-09-10 10:00-12:00 |
Zoom | Parallel Data Processing (Spark) |
[slides] [printable] | Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) Learning Spark (Ch. 1-2) Spark Documentation [link] |
Week 4 | ||||
2020-09-15 8:00-10:00 |
Zoom | Structured Data Processing (Spark SQL) |
[slides] [printable] | Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) Learning Spark (Ch. 3-6) Spark SQL Documentation [link] |
2020-09-16 15:00-17:00 |
Zoom | Stream Processing (Introduction, Kafka) |
[slides] [printable] | Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link] A Survey on the Evolution of Stream Processing Systems [pdf] |
2020-09-17 8:00-10:00 |
Zoom | Lab2 | [notebooks] | |
Week 5 | ||||
2020-09-22 10:00-12:00 |
Zoom | Stream Processing (Spark Streaming, Beam) |
[slides] [printable] | Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] Spark - The Definitive Guide (Ch. 20-23) Spark Streaming Documentation [link] Beam Documentation [link] |
2020-09-23 13:00-15:00 |
Zoom | Graph Processing (Pregel, GraphLab) |
[slides] [printable] | Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] |
Week 6 | ||||
2020-09-30 10:00-12:00 |
Zoom | Graph Processing (X-Stream, GraphX) |
[slides] [printable] | X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf] Spark - The Definitive Guide (Ch. 30) GraphX Documentation [link] |
2020-10-01 10:00-12:00 |
Zoom | Resource Management (Mesos, YARN) |
[slides] [printable] | Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] |
Week 7 | ||||
2020-10-06 10:00-12:00 |
Zoom | Guest Lecturer
Hooman Peiro Sajjad, Spotify |
[slides] | |