11 August 2018
the training data. In this tutorial, you will discover a gentle introduction to the k-fold cross-validation procedure for estimating the skill of machine learning models. Three models are trained and evaluated with each fold given a chance to be the held out test set. Those methods are approximations of leave- p -out cross-validation. As such, the procedure is often called k-fold cross-validation. As another example, suppose a model is developed to predict an individual's risk for being diagnosed with a particular disease within the next year. If cross-validation is used to decide which features to use, an inner cross-validation to carry out the feature selection on every training set must be performed. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k 1 subsamples are used as training data. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate.

In some cases such as least squares and kernel regression, cross-validation can be sped up significantly by pre-computing certain values that are needed repeatedly in the training, or by using fast "updating rules" such as the ShermanMorrison formula.To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g.