Being able to track the current state of a program as well as meta information without actually executing it; static analysis.

## Control-flow graph (CFG)

A graph where nodes are blocks and there is an edge between two blocks if there is a control-flow instruction (jump, conditional branch, return) from one block to the other.

Blocks are sequences of instructions that has a single distinct label, a possibly empty sequence of instructions, ending with a single control-flow instruction

- Very useful for dataflow analysis.
- Helpful to imagine edges leading from one instruction to the next
*within blocks*as well for analysis.

## Live variable analysis

A variable $v$ is considered **live** at a program point if $v$ is defined before the point and *might* be used by some future part of the execution path of the program.

- Key word “might,” we would rather mistake a variable as live when it isn’t.
- Variable use could depend on user input or other parts of the program that we don’t know the value of until we actually run it.

To perform **liveness analysis** is to calculate which variables are live between each statement.

To be useful, it should be more precise than scoping rules.

Consider the following program. Liveness information is computed *between* each statement.

The scopes of $x$, $a$, $b$, and $c$ are all the same, but $a$, $b$, and $c$ are *never live at the same time*. This is a good example of the difference between liveness and scope. Furthermore, this means they could theoretically be allocated to the same register.

### Liveness is associated with CFG edges

Edges connect blocks and sequential instructions, and are quite literally in between statements. We associate liveness information with each edge.

Useful for later optimizations where the same register can be used for different temporaries in the same instruction, therefore rendering it useless. See move-related edges.

### Uses and definitions

Every instruction *uses* some variables and may also *define* new variables. We can keep track of the set of used variables and set of new definitions for each statement.

- Create a map from a node/statement $s$ to the set of uses and definitions, respectively.
- For example, $use[s]$ and $def[s]$. If $s$ is
`a = b + c`

, then $use[s]={b,c}$ and $def[s]={a}$.

#### Formal liveness definition

A variable $v$ is considered live on an edge $e$ in the CFG if

- there exists some statement $s$ in the CFG such that $v∈use[s]$.
- there exists a directed path $p:e⇝s$ such that for every internal node $n∈p$, $v∈/def[n]$. In other words, every statement leading up to $s$ cannot redefine $v$.

### Computing liveness with dataflow analysis

Define two more sets for each statement $s$:

- $in[s]$ is the set of variables that are live on entry to $s$.
- $out[s]$ is the set of variables that are live on exit from $s$.

Consider two sequential statements $s_{1}→s_{2}$. Note that we *cannot* claim $in[s_{2}]=out[s_{1}]$. This is because in the CFG, there may be multiple statements that lead to $s_{2}$. That is, there could also exist a $s_{1}$ such that $s_{1}→s_{2}$. So we have to consider $out[s_{1}]$ as well when computing $in[s_{2}]$.

Other constraints that must be true for a statement $s$:

- $use[s]⊆in[s]$. Set of variables that $s$ uses must be a subset of those that are live on entry to $s$. $s$ doesn’t have to use all variables that are live, they might be used later in the program path.
- $out[s]−def[s]⊆in[s]$. If a variable is live on exist from $s$, and $s$ does not redefine it, then it must have been live on entry to $s$.

If a variable was redefined, then we remove it from the set of live variables on exit of a statement. We treat it as a completely new variable (and in LLVM, SSA means that this is true by default!).

- If $s_{1}→s_{2}$, then $in[s_{2}]⊆out[s_{1}]$. As explained above, it’s a subset relationship because multiple statements could feed into $s_{2}$ and provide liveness information.

#### Naive algorithm

For each variable defined in the program, find where it’s used in the program. Then, trace the path backwards in the CFG, marking the variable as live on edges until we reach the statement where it’s defined, or until we reach a previously-explored statement.

Extremely inefficient because we process variables one at a time, and may retrace the same path for different variables.

#### Iterative node algorithm

Instead, we can iteratively build up $in[s]$ and $out[s]$ for each statement, updating them as follows:

- $in[s]←use[s]∪(out[s]−def[s])$
- $out[s]←⋃_{s_{′}∈succ[s]}in[s_{′}]$

The algorithm stops executing when $in[s]$ and $out[s]$ stop changing for all statements $s$ in the CFG. This is guaranteed to occur because these two sets *monotonically increase in size* and there are only a *finite* number of statements in a program. Size never decreases, so we’ll reach the end eventually.

For each statement $s$, initialize $in[s]=∅$ and $out[s]=∅$. This is because for the best analysis, the liveness sets should be as small as possible, so starting out from empty makes sense.

```
for all statements s, in[s] = {} and out[s] = {}
repeat until no change in `in` and `out`:
for all statements s:
out[s] <- union(in[s']) where s' in succ[s]
in[s] <- use[s] union (out[s] - def[s])
end
end
```

#### Improving runtime by using a queue

Notice that $out[s]$ only changes if the successors of $s$ have changed. Therefore, we don’t need to process a statement $s_{′}$ if its successor $s$ has not changed.

To do this, we can instead use a queue instead of a while loop and for loop. For each statement, we check if $in[s]$ has changed after performing the union operation, and readd nodes that lead to $s$ back to the queue.

```
for all statements s, in[s] = {} and out[s] = {}
initialize a queue Q with all statements in the CFG
while Q is not empty:
s = Q.pop()
old_in = in[s]
out[s] <- union(in[s']) where s' in succ[s]
in[s] <- use[s] union (out[s] - def[s])
if (old_in != in[s]):
for all previous statements s in pred[s], execute Q.push(s)
end
end
```

The program naturally stops when the queue empties out, implying that $in$ and $out$ aren’t changing anymore.