Sample ratio mismatch

From KYNNpedia
Revision as of 02:06, 26 August 2023 by imported>Citation bot (Alter: title, template type. Add: isbn, pages, date, chapter, chapter-url. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox3 | #UCB_webform_linked 1521/1947)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In the design of experiments, a sample ratio mismatch (SRM) is a statistically significant difference between the expected and actual ratios of the sizes of treatment and control groups in an experiment. Sample ratio mismatches also known as unbalanced sampling<ref>Esteller-Cucala, Maria; Fernandez, Vicenc; Villuendas, Diego (2019-06-06). "Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization". Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization. ACM. pp. 153–159. doi:10.1145/3314183.3323853. ISBN 978-1-4503-6711-0. S2CID 190007129.</ref> often occur in online controlled experiments due to failures in randomization and instrumentation.<ref>Fabijan, Aleksander; Gupchup, Jayant; Gupta, Somit; Omhover, Jeff; Qin, Wen; Vermeer, Lukas; Dmitriev, Pavel (2019-07-25). "Diagnosing Sample Ratio Mismatch in Online Controlled Experiments". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM. pp. 2156–2164. doi:10.1145/3292500.3330722. ISBN 978-1-4503-6201-6. S2CID 196199621.</ref>

Sample ratio mismatches can be detected using a chi-squared test.<ref>Nie, Keyu; Zhang, Zezhong; Xu, Bingquan; Yuan, Tao (2022-10-17). "Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection". Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM. pp. 3391–3399. arXiv:2208.07766. doi:10.1145/3511808.3557087. ISBN 978-1-4503-9236-5. S2CID 251594683.</ref> Using methods to detect SRM can help non-experts avoid making discussions using biased data.<ref>Vermeer, Lukas; Anderson, Kevin; Acebal, Mauricio (2022-06-13). "Automated Sample Ratio Mismatch (SRM) detection and analysis". The International Conference on Evaluation and Assessment in Software Engineering 2022. ACM. pp. 268–269. doi:10.1145/3530019.3534982. ISBN 978-1-4503-9613-4. S2CID 249579055.</ref> If the sample size is large enough, even a small discrepancy between the observed and expected group sizes can invalidate the results of an experiment.<ref name="KDD19">Fabijan, Aleksander; Gupchup, Jayant; Gupta, Somit; Omhover, Jeff; Qin, Wen; Vermeer, Lukas; Dmitriev, Pavel (2019). "Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners" (PDF). Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2156–2164. doi:10.1145/3292500.3330722. ISBN 9781450362016. S2CID 196199621.</ref><ref>Kohavi, Ron; Thomke, Stefan (2017-09-01). "The Surprising Power of Online Experiments". Harvard Business Review. ISSN 0017-8012. Retrieved 2023-05-19.</ref>

Example

Suppose we run an A/B test in which we randomly assign 1000 users to equally sized treatment and control groups (a 50–50 split). The expected size of each group is 500. However, the actual sizes of the treatment and control groups are 600 and 400.

Using Pearson's chi-squared goodness of fit test, we find a sample ratio mismatch with a p-value of 2.54 × 10-10. In other words, if the assignment of users were truly random, the probability that these treatment and control group sizes would occur by chance is 2.54 × 10-10.<ref name="srm-checker">Vermeer, Lukas. "Frequently Asked Questions". SRM Checker. Retrieved 2022-09-15.</ref>

References

<references group="" responsive="1"></references>