This involved generating “nboots” bootstrap sample datasets; each was the same size as the original test set. To create these sample datasets, instances were randomly drawn from the test set with replacements. Evaluation metrics were calculated for each sample dataset, and the 95% confidence ...