Two bootstrap techniques to get confidence intervals for regression coefficients are the case/pair bootstrap method and the residual bootstrap method. Case Bootstrap The case or sometimes called pair is the response variables paired with the predictor variables . For B bootstrap samples Take a sample with replacement of size from the corresponding bootstrap sample will be . […]

# Overlay Density Curve Over Histogram Using R Base Graphics

It is useful to overlay the expected density function over a histogram. This can give you a visual cue to if the data actually fits the expected distribution, however this should not be a substitute for a goodness-of-fit test. In R base graphics there are two ways to do this. Find the min and max of […]

# Definition of Statistics?

Depends on who you ask. I am taking formal statistics training at the moment and this is my observation… I see statistics defined as either the science of uncertainty or definition is extracting information from data. I found that if the person comes from a mathematical background they tend to gravitate towards the science of uncertainty. If they have a engineering […]

# Cross Correlation in R on Time Series Data

Cross-Correlation is when two vectors of data are correlated. It is a measure of how similar the two signals are. The discreet formula for cross correlation is: Where t is the lag value applied to the time series. At the signals would be compared with no lag between them. Looking at the equation the function output […]

# Leave One Out Cross Validation

Cross validation is one of many metrics for estimating out-of-sample error for predictive models. There are many flavors of cross validation hold-out, k-fold, leave one out (LOOCV), etc. I whipped up a neat little visualization script in R to help understand LOOCV. In Figure 1 the visual that shows how LOOCV works. The model is trained on the […]

# Deep Learning with H2O

H2O deep learning algorithms are straightforward to use. In this post I demonstrate using h2o’s deep learning on the MINST dataset on kaggle.com. To begin I started a local node on my personnel computer with a max memory allocation of 24GB. H2O doesn’t allocate all the memory right away, but it does expect it to be […]

# Standard Error of the Mean – Derivation

The standard error (SE) is an amazingly useful statistical device for defining confidence intervals. In layman terms standard error is measure of how far a sample statistic is from it’s true value. This post will go through the process of deriving the SE of the mean. I have always wanted to dig deeper into where the […]