DoublePlay: Parallelizing Sequential Logging and Replay

DoublePlay finds a way to record and replay instructions running on multiple cores. It timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors.

Record and Replay (RR) is a fundamental tool for us to achieve reverse debugging. This paper is our first glance to state-of-the-art RR.

# Record and replay of program running on multicores

Assume the target program is running in a single core. That will be nice because we have all the tools to trace and record the instructions. For example, Embedded Tracing Macrocell (ETM) in Arm chips helps trace the instructions. Anyway, no matter how many threads (OS-level) the program used, as long as it ran in a single core, we can completely replay the execution procedure. In the hardware layer, which is beneath the operating system, we don't care how threads are scheduled in one single core.

But if the target program is running on multicores? Things get weird. Multicore programs can literally run in parallel. There are several cores and each of them can execute one instruction at a time. It's even worse when you notice that they can access (read, modify, disrupt, damage) the same area of memory, called the shared memory, at the same time.

The order of the execution of instuctions are important that may change the behavior of the program. However, it's hard to trace and record the order when it comes to a multicore program. Hardware tracing component are running on each single processor.

You may argue that ETM tracing output contains time stamp that helps us firgure out the order of instuctions. Well, that's true but does not always work. What if there's only a subtle time difference between two instuctions running on different cores, say, instruction A starts at nanosecond 1 and instruction B starts at nanosecond 1.001? ETM cannot be that accurate.

Researches consider transform a multicore program into an equivalent single core procedure. DoublePlay introduces an efficient, precise way.

# Basic idea of DoublePlay

At first, the multicore program will be normally executed in several processors. The execution is called thread-parral execution. At the same time, DoublePlay timeslices the execution into several epoches.

When an epoch begins (e.g. Ep0 begins at the same time when the program begins), DoublePlay copies the status of whole system, including registers and memories, to a new processor, and run all the threads, which is also running in thread-parral execution at the same time, in this single core. This is called epoch-parallel execution.

In the figure, there were four threads called A, B, C and D running in four different processors in thread-parallel execution. The four threads were going to be run in a single, new processor "CPU4" at the beginning of Epoch 0.

If we are lucky enough, then the thread-parallel execution will have same result as the epoch-parallel execution. However, the order of instructions executed in epoch-parallel execution might be different from the order in thread-parallel execution. Thus the result may be differnet, too. Different result makes the recording meaningless. DoublePlay has to roll-back to the conflict epoch and run again.

Let's take a look at how the conflict is found. In the figure above, ⓶ is the status at the end of Ep1 running in epoch-parallel execution. ⓵ is the status at the beginning of Ep2 running in epoch-parallel execution, which is copied from thread-parallel execution, and is independent from ⓶. Actually, we can know ⓵ earlier than ⓶.

DoublePlay checks if the status of ⓶ is the same as ⓵. If it is the same, then we have a equivalent singlecore version of Ep1. This verision can be used in replay.

# Sort of Online Replay!

DoublePlay is an off-line replay. However, it needs roll-back when conflicts. To minimize the time of roll-back, DoublePlay comes up with an idea.

In thread-parallel execution, DoublePlay records order and result of system calls and low-level synchonization operations. During the epoch-parallel execution, instead of executing system calls and synchonization operations again, DoublePlay directly give the result back. It will also block some threads to match the recorded order of system calls and low-level synchonization operations.

Thus, DoublePlay claims that there will be no conflict if there's no data race in the program.