Schedule 2019
Date | Location | Topic | Notes | Reading |
Week 1 | ||||
2019-08-27 15:00-17:00 |
Sal-B | Introduction | [slides] [printable] | The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
2019-08-30 15:00-17:00 |
Sal-B | Storage (GFS, HopsFS) Guest Lecturer: Salman Niazi |
GFS [slides] [printable] HopsFS [slides] |
The Google File System [pdf]
HopsFS: Scaling Hierarchical File System Metadata Using NewSQL [pdf] |
Week 2 | ||||
2019-09-03 15:00-17:00 |
Sal-B | Storage (BigTable, Cassandra, Neo4j) |
[slides] [printable] | Bigtable: A Distributed Storage System for Structured Data [pdf]
Cassandra: A Decentralized Structured Storage System [pdf] Graph Databases (Ch. 3, 6) Neo4j Documentation [link] |
2019-09-5 15:00-17:00 |
Sal-C | Programming Languages (Scala) |
[slides] [printable] | Scala By Example [pdf] |
Week 3 | ||||
2019-09-10 15:00-17:00 |
Sal-B | Parallel Data Processing (MapReduce, FlumeJava) |
[slides] [printable] | MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf] Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
2019-09-12 15:00-17:00 |
Sal-C | Parallel Data Processing (Spark) |
[slides] [printable] | Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) Spark Documentation [link] |
Week 4 | ||||
2019-09-17 15:00-17:00 |
Sal-B | Structured Data Processing (Spark SQL) |
[slides] [printable] | Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) Spark SQL Documentation [link] |
2019-09-19 15:00-17:00 |
Sal-B | Stream Processing (Introduction, Kafka) |
[slides] [printable] | Kafka: a Distributed Messaging System for Log Processing [pdf]
Kafka Documentation [link] |
Week 5 | ||||
2019-09-24 15:00-17:00 |
Sal-B | Stream Processing (Flink) Guest Lecturer: Paris Carbone |
[slides] | Apache Flink: Stream and Batch Processing in a Single Engine [pdf] State Management in Apache Flink [pdf] Flink Documentation [link] |
2019-09-26 15:00-17:00 |
Sal-B | Stream Processing (Spark Streaming, Beam) |
[slides] [printable] | Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] Spark - The Definitive Guide (Ch. 20-23) Spark Streaming Documentation [link] Beam Documentation [link] |
Week 6 | ||||
2019-10-01 15:00-17:00 |
Sal-B | Graph Processing (Pregel, GraphLab) |
[slides] [printable] | Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] |
2019-10-03 15:00-17:00 |
Sal-C | Graph Processing (X-Stream, GraphX) |
[slides] [printable] | X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf]
GraphX: Graph Processing in a Distributed Dataflow Framework [pdf] Spark - The Definitive Guide (Ch. 30) GraphX Documentation [link] |
Week 7 | ||||
2019-10-08 15:00-17:00 |
Sal-C | Resource Management (Mesos, YARN) |
[slides] [printable] | Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] |