Sample size and the multivariate kernel density likelihood ratio: how many speakers are enough?

Research output: Contribution to journalArticlepeer-review


The likelihood ratio (LR) is now widely accepted as the appropriate framework for evaluating expert evidence. However, an empirical issue in forensic voice comparison is the number of speakers required to generate robust LR output and adequately test system performance. In this study, Monte Carlo simulations were used to synthesise temporal midpoint F1, F2 and F3 values from the hesitation marker um from a set of raw data consisting of 86 male speakers of standard southern British English. Using the multivariate kernel density LR approach, these data were used to investigate: (1) the number of development (training) speakers required for adequate calibration, (2) the number of test speakers needed for robust validity, and (3) the effects of varying the number of reference speakers. The experiments were run over 20 replications to assess the effects of which, as well as how many, speakers are included in each set. Predictably, LR output was most imprecise using small samples. Comparison across the three experiments shows that the greatest variability in LR output was found as a function of the number of development speakers – where stable LR output was only achieved with more than 20 speakers. Thus, it is possible to achieve stable output (in terms of system-level metrics) with small numbers of test and reference speakers, as long as the system is adequately calibrated. Importantly, however, LRs for individual comparisons may still be substantially affected by the inclusion of additional speakers in each set, even when large samples are used.
Original languageEnglish
Pages (from-to)15-29
Number of pages15
JournalSpeech Communication
Early online date18 Aug 2017
Publication statusPublished - Nov 2017

Bibliographical note

© 2017 Elsevier B.V. All rights reserved. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy.


  • Calibration
  • Hesitation markers
  • Likelihood ratio
  • MVKD
  • Sample size
  • Validity

Cite this