International Journal of applied mathematics and computer science

online read us now

Paper details

Number 3 - September 2010
Volume 20 - 2010

Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

Przemysław Klęsk

Abstract
Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.

Keywords
regression estimation, model comparison, complexity selection, cross-validation, generalization, statistical learning theory, generalization bounds, structural risk minimization

DOI
10.2478/v10006-010-0039-x