Compiler Optimizations - side effects

Controlling Compiler Optimizations

Methods to control and/or make compiler memory accesses deterministic in multithreaded applications:

  • Using locks (mutex, spinlocks, ...).
  • Using atomics
  • Using fences

Store Tearing

Store tearing occurs when the compiler uses multiple store instructions for a single access.

For example, one thread might store 0x12345678 to a four-byte integer variable at the same time as another thread stored 0xabcdef00. If the compiler used 16-bit stores for either access, the result might well be 0x1234ef00, which could come as quite a surprise to code loading from this integer. There are CPUs that feature small immediate values, and on such CPUs, the compiler can be tempted to split a 64-bit store into two 32-bit stores in order to reduce the overhead of explicitly forming the 64-bit constant in a register, even on a 64-bit CPU.

Note that this tearing can happen even on properly aligned and machine-word-sized accesses, and even for volatile stores.

Load Fusing

https://lwn.net/Articles/793253/

Load fusing occurs when the compiler uses the result of a prior load from a given variable instead of repeating the load. Not only is this sort of optimization just fine in single-threaded code, it is often just fine in multithreaded code. Unfortunately, the word "often" hides some truly annoying exceptions, including the one called out in the ACCESS_ONCE() article.

We do occasionally use READ_ONCE() to prevent load-fusing optimizations that would otherwise cause the compiler to turn while-loops into if-statements guarding infinite loops.

Load fusing can be prevented by using READ_ONCE() or by enforcing ordering between the two loads using barrier(), smp_rmb().

ACCESS_ONCE()

ACCESS_ONCE: its purpose is to ensure that the value passed as a parameter is accessed exactly once by the generated code.

E.g., this code:

for (;;) {
    struct task_struct *owner;

    owner = ACCESS_ONCE(lock->owner);
    if (owner)
        break;
    ...
}

Cannot be optimized into this:

struct task_struct *owner;
owner = ACCESS_ONCE(lock->owner);
for (;;) {
    if (owner && !mutex_spin_on_owner(lock, owner))
        break;
    ...
}

The compiler misses that a value may be changed by another thread of execution.

 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

In other words, it works by turning the relevant variable, temporarily, into a volatile type.

It is only in places where shared data is accessed without locks (or explicit barriers) that a construct like ACCESS_ONCE() is required.

READ_ONCE()

Pretty much similar to ACCESS_ONCE():

#define __READ_ONCE(x)  (*(const volatile __unqual_scalar_typeof(x) *)&(x))

What the compiler can / cannot do

Can merge multiple writes to a shared variable with a single one (even with locks)

This does not violate SC.

E.g.,

lock_shared_ctr

for (i = 0; i < N; i++)
    shared_ctr++

unlock_shared_ctr

can become:

lock_shared_ctr

r1 = shared_ctr
for (i = 0; i < N; i++)
    r1++
shared_ctr = r1

unlock_shared_ctr

Invent read

Used to implement conditional write via a register.

Cannot

It must never invent a write to a variable that would not have been written to seq cst execution.