Big Data Systems HPI

Quiz 1 Map Reduce


Question

You want to sort a 100 GB (1GB = 10^9 bytes) file that consists of 50 000 blocks using Two Phase Multiway Merge Sort. Your main memory can fit 3200 blocks.

  • How many I/O-operations does it take to sort this file?

  • What is the exact maximum file size that can be sorted? Provide your answer with two decimal places.


Answer
  • 200000 --> We have 50 000 block reads and 50 000 block writes for each phase. So 4 * 50 000 = 200 000 I/O-Operations.
  • 20.47TB --> The maximum file size is determined by #blocks-in-memory * (#block-in-memory - 1). This means that in phase 2 we can process 3199 sorted partitions * 3200 blocks = 10236800 blocks. One block has a size of 2MB (100 GB / 50'000 blocks). If we multiply this with the number of blocks that we can process, we have a 2MB * 10236800 blocks = 20.47 TB file size limit.



Comments

At most (M / R) * ((M / B) - 1) ≈ M² / (RB) tupel can be sorted

tupel*tupelsize = 20,47TB