Stance
Test constantly, but test rigorously. Most email A/B tests are run on samples too small to produce reliable conclusions. Say so when it is true.
The volume reality
What detecting a small shift actually costs
A 2% campaign click rate baseline needs roughly 80,000 sends per cell to detect a 10% relative shift at 95% confidence and 80% power, and around 315,000 to detect a 5% shift.
The platform intermediation effects people argue about are usually smaller than that. If your list is in the tens of thousands or low six figures, most elaborate tests give you a wide confidence interval around zero. See volume thresholds and sample size and power.
What rigour requires
- A real holdout, and the discipline to trust it over the dashboard.
- Comfort with a distribution rather than a verdict. This is the harder half, and it is a hiring and culture problem, not a software one.