H2O is an open source machine learning / prediction platform from www.h2o.ai. Code is hosted on github, here. APIs are available for R, Scala, Python, Java. A REST API is available for web services or applications.
R H20 Tutorial
Installing h2o package.
# Install h2o package. Downloads latest version.
# This might not be what wanted if making a cluster.
# Client and server need same h2o version installed.
# The download contains an R package in the R folder.
Creating a simple gradient boost model.
# create h2o frame from R data frame
iris.h2o <- as.h2o(iris)
# Split data into training and test sets
split <- h2o.splitFrame(iris.h2o, ratios = 0.75,
destination_frames = c("iris.train","iris.test"))
# create training and test set, workspace references
iris.train <- split[]
iris.test <- split[]
# Train a multinomial model on the training data
# x and y are vectors of column names
x = names(iris.train)[1:4] # column 5 is Species
iris.model.gbm <- h2o.gbm(x = x, y = "Species",
training_frame = iris.train,
model_id = "iris.model.gbm",
distribution = "multinomial")
# peak at model training results
# make performance predictions on the test set
perf <- h2o.performance(iris.model.gbm, iris.test)
# Calculate the mean square error
# shutdown the h2o node
Node(s) store h2o data frames. The R workspace variables maintain references to h2o data frames. Removing an h2o object from the R workspace will not delete it from the cluster. In the example above, destination_frames, defines h2o data frame names on the cluster. Logging into a running cluster or node can be done, by ip address : port. In the above example localhost:54321 should work. With the web interface this tutorial could have been done with no R at all.
I look forward to exploring functionality within h2o especially deep learning models. I will post later how to deploy an h2o cluster in the near future.