The course consists of five tasks as listed below. All the assignments should be done in groups of two/three students.


Task1 has five sets of review questions, one set of questions per week. This assignment is a pass/fail task.


Task2 consists of four programming assignments; each focuses on a specific course topic. There is no deadline for this assignment, but we recommend doing each lab after their lectures. During the lab sessions, we go through the lab assignments.
  • Lab 1: HDFS, HBase, and Cassandra [pdf]
  • Lab 2: Spark and Spark SQL [pdf] [src]
  • Lab 3: Spark Streaming [pdf] [src]
  • Lab 4: GraphX [pdf] [src]


In Task3, each group will choose one of the course topics and work on that topic by writing an essay and presenting it to their opponents (another group). This task has several steps:
  • Step 1: each group should select a module and study relevant material.
  • Step 2: each group member should individually work on a subtopic and prepare a presentation to present to the other members.
  • Step 3: each group should write an essay to explain their findings and accordingly make a presentation.
  • Step 4: each group should study the essay of another group and prepare questions for an opposition.
  • Step 5: each group should present their findings and answer the questions from their opponent group.
Grading of this task has the following parts:
  • E: Essay - group grade (5 points)
  • P: Presentation and interaction - group grade (2 points)
  • Q: Rewiewing the essay ans asking questions - group grade (2 points)
  • A: Answering questions - individual grade (1 point)
Task3 final grade: A: 10, B: 9, C: 8, D: 7, E: 6, F: <5.


In Task4, you should define your own project by writing a one-page description of the project and getting your project proposal approved by the examiner. The project proposal should cover the following headings:
  • Problem description: what is the problem that you will be investigating?
  • Tools: what tools you are going to use? In the course, we mainly used Spark, but you are free to explore new tools and technologies.
  • Data: what data will you use, and how are you going to collect it?
  • Methodology and algorithm: what method(s) or algorithm(s) are you proposing?
You can implement your code using Jupyter Notebook or as a stand-alone application. You should submit a zip file containing your code and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. This task is an A-F graded task.


Task5 is the final exam, including questions from all modules covered in the course. This task grade is also A-F (5-0).

The Final Grade

The final grade of the course is computed as 0.3*Task3 + 0.3*Task4 + 0.4*Task5. To pass the course, students should pass Task1, and Tasks 3, 4, and 5 should be greater than E.