Schedule 2018
Date | Location | Topic | Notes | Reading | |
Week 1 | |||||
2018-08-28 15:00-17:00 |
Sal-B | Introduction | slides [pdf] | The NIST Definition of Cloud Computing [pdf]
Above the Clouds: A Berkeley View of Cloud Computing [pdf] A Comparative Taxonomy and Survey of Public Cloud Infrastructure Vendors [pdf] |
|
2018-08-31 10:00-12:00 |
K-208 | Storage (GFS, Flat FS) |
slides [pdf] | The Google File System [pdf]
Flat Datacenter Storage [pdf] |
|
Week 2 | |||||
2018-09-03 15:00-17:00 |
Sal-B | Storage (Dyanmo, BigTable, Cassandra) |
slides [pdf] | Dynamo: Amazon's Highly Available Key-value Store [pdf]
Bigtable: A Distributed Storage System for Structured Data [pdf] Cassandra: A Decentralized Structured Storage System [pdf] |
|
Week 3 | |||||
2018-09-10 13:00-15:00 |
Sal-C | Programming Languages (Scala) |
slides [pdf] | Scala By Example [pdf] | |
2018-09-14 10:00-12:00 |
K-208 | Parallel Data Processing (MapReduce, FlumeJava) |
slides [pdf] | MapReduce Simplifed Data Processing on Large Clusters [pdf]
FlumeJava: Easy, Efficient Data-Parallel Pipelines [pdf] Data-Intensive Text Processing with MapReduce (Ch. 2-3) |
|
Week 4 | |||||
2018-09-18 10:00-12:00 |
Sal-C | Parallel Data Processing (Spark) |
slides [pdf] | Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [pdf]
Spark - The Definitive Guide (Ch. 2, 12-14) |
|
Week 5 | |||||
2018-09-24 15:00-17:00 |
Sal-C | Structured Data Processing (Spark SQL) |
slides [pdf] [data] | Spark SQL: Relational Data Processing in Spark [pdf]
Spark - The Definitive Guide (Ch. 4-11) |
|
2018-09-27 10:00-12:00 |
Sal-B | Stream Processing (Introduction, Kafka) |
slides [pdf] | High-Availability Algorithms for Distributed Stream Processing [pdf]
Kafka: a Distributed Messaging System for Log Processing [pdf] Designing Data-Intensive Applications (Ch. 11) Fundametals of Stream Processing (Ch. 1-5, 7, 9) | |
Week 6 | |||||
2018-10-01 13:00-15:00 |
Sal-C | Stream Processing (Storm, Millwheel, Google Dataflow) |
slides [pdf] [src] | Storm @Twitter [pdf]
MillWheel: Fault-Tolerant Stream Processing at Internet Scale [pdf] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing [pdf] |
|
2018-10-05 13:00-15:00 |
Sal-B | Stream Processing (Spark Streaming, Flink) |
slides [pdf] [src] | Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters [pdf]
Apache Flink: Stream and Batch Processing in a Single Engine [pdf] Spark - The Definitive Guide (Ch. 20-23) |
|
Week 7 | |||||
2018-10-08 15:00-17:00 |
Sal-B | Graph Processing (Pregel, GraphLab, X-Stream) |
slides [pdf] | Pregel: A System for Large-Scale Graph Processing [pdf]
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud [pdf] PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs [pdf] X-Stream: Edge-Centric Graph Processing using Streaming Partitions [pdf] |
|
2018-10-09 10:00-12:00 |
Sal-A | Graph Processing (GraphX, Giraph++, Pegasus) |
slides [pdf] | GraphX: Graph Processing in a Distributed Dataflow Framework [pdf]
From "Think Like a Vertex" to "Think Like a Graph" [pdf] Pegasus: Mining Peta-Scale Graphs [pdf] Spark - The Definitive Guide (Ch. 30) |
|
2018-10-12 10:00-12:00 |
Sal-C | Resource Management (Mesos, YARN) |
slides [pdf] | Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [pdf]
Apache Hadoop YARN: Yet Another Resource Negotiator [pdf] |
|