H2O deep learning algorithms are straightforward to use. In this post I demonstrate using h2o’s deep learning on the MINST dataset on kaggle.com.
To begin I started a local node on my personnel computer with a max memory allocation of 24GB. H2O doesn’t allocate all the memory right away, but it does expect it to be there when it needs it.
# Load H2O library
# create local node with max 24 Gigabytes of ram
h2o <- h2o.init(max_mem_size = '24g')
# Load training and test datasets
train <- read.csv("train.csv")
train$label <- as.factor(train$label)
test <- read.csv("test.csv")
h2o.train <- as.h2o(train, destination_frame = "training_data")
h2o.test <- as.h2o(test, destination_frame = "testing_data")
model.dl <- h2o.deeplearning(2:784, 1, training_frame = h2o.train,
nfolds = 2,
hidden = c(750,750))
# Plot Learning Rate
# get prediction on test set from h2o deep learning model
h2o.predictions <- h2o.predict(model.dl, h2o.test)
# create prediction data frame
predictions <- data.frame(ImageId = 1:nrow(h2o.predictions),
Label = as.vector(h2o.predictions[,1]))
write.csv(predictions, "h2odl_MINST_submission.csv", row.names = F)
All the functions in h2o typically start with the h2o.* suffix. The h2o deep learning model is called by h2o.deeplearning . As will all h2o models nfolds can be set to determine the number of k-fold cross-validations to perform for this exercise I used 2 folds, however the recommended number of folds is between 5 – 10 to remove bias. In lieu of k-fold cross-validation a validation frame can be set validation_frame = "h2o validation data frame" .
Using this script the submission score accuracy was 96% which is pretty good out of the box with default parameters. One feature of h2o which I hope to explore later is the grid search for optimization the hyper parameters of the model.
I am still impressed with how easy h2o is to use out of the box and look forward to learning how to leverage all h2o has to offer.