Model Selection and Evaluation in Supervised Machine Learning
|Authors:||Westphal, Max||Supervisor:||Brannath, Werner||1. Expert:||Brannath, Werner||2. Expert:||Zapf, Antonia||Abstract:||
In this thesis, we propose new model evaluation strategies for supervised machine learning. Our main goal is to reliably and efficiently infer the generalization performance of one or multiple prediction models based on limited data. So far, a strict separation of model selection and performance assessment has been recommended. While this approach is valid, it lacks flexibility as a flawed model selection can usually not be corrected without compromising the statistical inference. We suggest to evaluate multiple promising models on the test dataset, thereby taking into account more observations for the final selection process. We employ a parametric simultaneous test procedure to adjust the inferences (test decisions, point estimates) for multiple comparisons. We extend this method to enable a simultaneous evaluation of multiple binary classifiers with regard to sensitivity and specificity as co-primary endpoints. In both cases, approximate control of the family-wise error rate is warranted. Besides this established frequentist procedure, we propose a new multivariate Beta-binomial model for the analysis of multiple proportions with general correlation structure. This Bayesian approach allows to incorporate prior knowledge to the inference task. Finally, we derive a new decision rule for subset selection problems. Our method is developed in the framework of Bayesian decision theory by employing a novel utility function. Compared to previous approaches, this method is computationally more complex but hyperparameter-free. We illustrate in extensive simulation studies that our framework can improve the expected final model performance and statistical power, the probability to correctly identify a sufficiently good model. While an unbiased point estimation is no longer possible, the selection-induced bias can be corrected in a conservative manner. The family-wise error rate is controlled under realistic parameter configurations given that a moderate number of observations is available. We conclude that the test data can be used for model selection when suitable adjustments for multiple comparisons are applied. This increases the flexibility and statistical efficiency compared to traditional approaches. Our framework can help to prevent the implementation of flawed models into sensitive and large-scale applications while at the same time reliably identifying truly capable solutions.
|Keywords:||artificial intelligence; Bayesian inference; bias; classification; co-primary endpoints; decision theory; diagnosis; diagnostic accuracy; hypothesis testing; multiple comparisons; performance assessment; predictive modelling; prognosis; regulation; subset selection; uncertainty quantification||Issue Date:||16-Dec-2019||DOI:||10.26092/elib/16||URN:||urn:nbn:de:gbv:46-elib42319||Institution:||Universität Bremen||Faculty:||FB3 Mathematik/Informatik|
|Appears in Collections:||Dissertationen|
checked on Sep 26, 2020
checked on Sep 26, 2020
This item is licensed under a Creative Commons License