Verifying Performance Improvements

April 02, 2023

As I keep forgetting how statistical tests are done, I’ll write the process down for future me. To learn more about practical statistics you can check Three Statistical Tests Every Game Developer Should Know (applicable not only to game developers). To learn the value of the statistical approach compared to eyeballing the data, you can check "Performance Matters" by Emery Berger. And for presenting perf changes I can re-use my own approach.

The situation is

a code change is made or under review;
performance change measurements are noisy and it’s not clear if a change is an improvement or not.

In this case

Measure performance before and after the change. Make few measurements, not just 1 before and 1 after.
Put measurements in Numbers.app.
Formulate null hypothesis like “Performance is the same before and after the change”.
Select p-value. Go for 0.01 to avoid fooling yourself. If you need to convince someone else, 0.05 is still acceptable
Calculate t-test value with TTEST function. Use “two tails” and "two-sample unequal”.
If the calculated value less then pre-selected p-value, null hypothesis is ~~disproved~~ likely to be wrong. Otherwise you know your “improvement” is within noise and not statistically significant.

It can be worth trying two-sample equal in TTEST function. Don’t think it is a right approach as the variance is not guaranteed to be the same. But it would be curious to find a situation where it matters (haven’t encountered such a situation so far).