Big Data Systems HPI

Quiz 4 - Stream Processing


Question

You want to build a stream processing engine and a thinking about how to efficiently store windows and which approach to use.

To get a feeling for the storage cost, you consider a sliding window of length 1 hour with a 5 second slide. Your assumed stream has 1000 events per second.

The options you are considering are naive tuple buffers and stream slicing. Using tuple buffers, you need one list of events per window and you assign each event to all the buffers that it belongs to. The slicing approach needs only one list of events per slice and you assign the event to the corresponding slice.

An event is 120 bytes in size. If your application runs for 1 hour, how much storage do both approaches require? For this calculation, you do not delete events. 1 MB = 10^6 bytes.


Answer

Slicing: #eventspersec * event size * time_in_sec = total_eventsslide_time#eventspersec*event size = 432MB

naive tuple buffer: total_events*(time+ window_size)/2 #eventspersec * event size = 155736MB




Comments