Two bootstrap techniques to get confidence intervals for regression coefficients are the case/pair bootstrap method and the residual bootstrap method. Case Bootstrap The case or sometimes called pair is the response variables paired with the predictor variables . For B bootstrap samples Take a sample with replacement of size from the corresponding bootstrap sample will be . […]

# Category: Statistics

# Overlay Density Curve Over Histogram Using R Base Graphics

It is useful to overlay the expected density function over a histogram. This can give you a visual cue to if the data actually fits the expected distribution, however this should not be a substitute for a goodness-of-fit test. In R base graphics there are two ways to do this. Find the min and max of […]

# Definition of Statistics?

Depends on who you ask. I am taking formal statistics training at the moment and this is my observation… I see statistics defined as either the science of uncertainty or definition is extracting information from data. I found that if the person comes from a mathematical background they tend to gravitate towards the science of uncertainty. If they have a engineering […]

# Leave One Out Cross Validation

Cross validation is one of many metrics for estimating out-of-sample error for predictive models. There are many flavors of cross validation hold-out, k-fold, leave one out (LOOCV), etc. I whipped up a neat little visualization script in R to help understand LOOCV. In Figure 1 the visual that shows how LOOCV works. The model is trained on the […]

# Standard Error of the Mean – Derivation

The standard error (SE) is an amazingly useful statistical device for defining confidence intervals. In layman terms standard error is measure of how far a sample statistic is from it’s true value. This post will go through the process of deriving the SE of the mean. I have always wanted to dig deeper into where the […]

# Probability

In lieu of diving into logistic regression. I am going to review probability. What is probability? Outcomes of interest versus all possible outcomes. Mathematically this is represented by: where P(A) is the probability. Numerical values for probability can range as a continuous variable from zero to one e.g(0.1, 0.99996, 0.23, 1). For example a bag […]