For example, consider a model with the confusion matrix below; Image by author We see that although the accuracy is high, the precision is low. This makes it important to not only monitor accuracy but also monitor the precision and recall to better tell of a model’s performance on an imb...
In a multi-classfication problem I’m interested in knowing if the performance of my model (micro f1-score) is statistically significant over the baseline. Thus, I am looking for confidence intervals for the f1-score. I want to use bootstrap, but since my model is large enough, I cann...