Schedule
Date | Location | Topic | Notes | Reading |
Week 1 | ||||
2023-08-29 8:00-10:00 |
Sal-B | Introduction | [slides] [printable] | The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
2023-09-01 13:00-15:00 |
Sal-A | Storage GFS |
[slides] [printable] |
The Google File System [pdf] |
Week 2 | ||||
2023-09-05 13:00-15:00 |
Sal-B | Storage (BigTable, Cassandra, Neo4j) |
[slides] [printable] | Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf] Graph Databases (Ch. 3, 6) Neo4j Documentation [link] |
2023-09-06 13:00-15:00 |
Sal-A | Scala | [slides] [printable] | Scala By Example [pdf] |
Week 3 | ||||
2023-09-11 15:00-17:00 |
Sal-B | Parallel Data Processing (MapReduce) |
[slides] [printable] | MapReduce Simplifed Data Processing on Large Clusters [pdf]
Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
2023-09-12 8:00-10:00 |
Sal-B | Parallel Data Processing (Spark) |
[slides] [printable] |
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) Learning Spark (Ch. 1-2) Spark Documentation [link] |
2023-09-12 13:00-15:00 |
Sal-B | Lab 1 | ||
Week 4 | ||||
2023-09-18 15:00-17:00 |
Sal-B | Structured Data Processing (Spark SQL) |
[slides] [printable] | Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) Learning Spark (Ch. 3-6) Spark SQL Documentation [link] |
2023-09-20 10:00-12:00 |
Sal-B | Stream Processing (Introduction, Kafka) |
[slides] [printable] |
Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link] A Survey on the Evolution of Stream Processing Systems [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] |
2023-09-22 13:00-15:00 |
Sal-B | Lab2 | ||
Week 5 | ||||
2023-09-25 15:00-17:00 |
Sal-B | Stream Processing (Spark Streaming) |
[slides] [printable] |
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark [pdf] Spark - The Definitive Guide (Ch. 20-23) Learning Spark (Ch. 8) Spark Streaming Documentation [link] |
2023-09-29 13:00-15:00 |
Sal-B | Graph Processing (Pregel, GraphLab, GraphX) |
[slides] [printable] |
Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] GraphX: Graph Processing in a Distributed Dataflow Framework [pdf] Spark - The Definitive Guide (Ch. 30) GraphX Documentation [link] |
Week 6 | ||||
2023-10-02 15:00-17:00 |
Sal-B | Resource Management (Mesos, YARN, Borg, Kubernetes) |
[slides] [printable] |
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] Large-Scale Cluster Management at Google with Borg [pdf] |
2023-10-03 10:00-12:00 |
Sal-B | Cloud Data Lakes | [slides] [printable] |
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores [pdf]
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [pdf] Learning Spark (Ch. 9) |
Week 7 | ||||
2023-10-09 15:00-17:00 |
Sal-B | Mohammadhossein Andjedani From bytes to insights, when Machine Learning meets a tsunami of data |
||