Cross validation is one of many metrics for estimating out-of-sample error for predictive models. There are many flavors of cross validation hold-out, k-fold, leave one out (LOOCV), etc. I whipped up a neat little visualization script in R to help understand LOOCV. In Figure 1 the visual that shows how LOOCV works. The model is trained on the green data points and tested on the red data points. Each row in Figure 1 is a fold. LOOCV is exactly how it sounds, one element of n observations is left out and used for testing. The remaining n-1 observations are then used to train the model. The average error is calculated across n models. Using LOOCV method reduces the model bias, but can be computational taxing for large number of observations or complex models as n models are trained.

Below is the code used to create Figure 1.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# used for handy data splitting library(caret) # grab a small dataset folds <- createFolds(mtcars$mpg, k = length(mtcars$mpg)) # create a stacked stripchart where each row is a fold pltLoc <- 1.4 for(fold in folds){ foldx <- rep(1, length(mtcars$mpg)) foldx[fold] = 0 if(pltLoc == 1.4){ stripchart(1:length(mtcars$mpg), main = "LOO Cross Validation", xlab = "index", ylab = "fold", col = foldx+2, pch = 22, bg = foldx+2, at = pltLoc) } else{ stripchart(1:length(mtcars$mpg), col = foldx+2, pch = 22, bg = foldx+2, add = TRUE, at = pltLoc) } pltLoc = pltLoc - 0.025 } |

You must log in to post a comment.