Sample ratio mismatch
In the design of experiments, a sample ratio mismatch (SRM) is a statistically significant difference between the expected and actual ratios of the sizes of treatment and control groups in an experiment. Sample ratio mismatches also known as unbalanced sampling<ref>Esteller-Cucala, Maria; Fernandez, Vicenc; Villuendas, Diego (2019-06-06). "Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization". Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization. ACM. pp. 153–159. doi:10.1145/3314183.3323853. ISBN 978-1-4503-6711-0. S2CID 190007129.</ref> often occur in online controlled experiments due to failures in randomization and instrumentation.<ref>Fabijan, Aleksander; Gupchup, Jayant; Gupta, Somit; Omhover, Jeff; Qin, Wen; Vermeer, Lukas; Dmitriev, Pavel (2019-07-25). "Diagnosing Sample Ratio Mismatch in Online Controlled Experiments". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM. pp. 2156–2164. doi:10.1145/3292500.3330722. ISBN 978-1-4503-6201-6. S2CID 196199621.</ref>
Sample ratio mismatches can be detected using a chi-squared test.<ref>Nie, Keyu; Zhang, Zezhong; Xu, Bingquan; Yuan, Tao (2022-10-17). "Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection". Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM. pp. 3391–3399. arXiv:2208.07766. doi:10.1145/3511808.3557087. ISBN 978-1-4503-9236-5. S2CID 251594683.</ref> Using methods to detect SRM can help non-experts avoid making discussions using biased data.<ref>Vermeer, Lukas; Anderson, Kevin; Acebal, Mauricio (2022-06-13). "Automated Sample Ratio Mismatch (SRM) detection and analysis". The International Conference on Evaluation and Assessment in Software Engineering 2022. ACM. pp. 268–269. doi:10.1145/3530019.3534982. ISBN 978-1-4503-9613-4. S2CID 249579055.</ref> If the sample size is large enough, even a small discrepancy between the observed and expected group sizes can invalidate the results of an experiment.<ref name="KDD19">Fabijan, Aleksander; Gupchup, Jayant; Gupta, Somit; Omhover, Jeff; Qin, Wen; Vermeer, Lukas; Dmitriev, Pavel (2019). "Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners" (PDF). Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2156–2164. doi:10.1145/3292500.3330722. ISBN 9781450362016. S2CID 196199621.</ref><ref>Kohavi, Ron; Thomke, Stefan (2017-09-01). "The Surprising Power of Online Experiments". Harvard Business Review. ISSN 0017-8012. Retrieved 2023-05-19.</ref>
Example
Suppose we run an A/B test in which we randomly assign 1000 users to equally sized treatment and control groups (a 50–50 split). The expected size of each group is 500. However, the actual sizes of the treatment and control groups are 600 and 400.
Using Pearson's chi-squared goodness of fit test, we find a sample ratio mismatch with a p-value of 2.54 × 10-10. In other words, if the assignment of users were truly random, the probability that these treatment and control group sizes would occur by chance is 2.54 × 10-10.<ref name="srm-checker">Vermeer, Lukas. "Frequently Asked Questions". SRM Checker. Retrieved 2022-09-15.</ref>
References