About

This blog is intended to document my day to day experiences as a statistical consultant. Basically, if I have to spend more than a few minutes thinking through a statistical argument, I'd like to leave myself a record of my thought process.

As such, the subject matter here could really be anything: theory, applications, modeling, computation, visualization, data mining, Bayesian analysis, experimentation, big data, machine learning.

Posts

Sep 14, 2018 Probability Integral Transform, A Proof
The probability integral transform is a fundamental concept in statistics that connects the cumulative distribution function, the quantile function, and the uniform distribution. We motivate the need for a generalized inverse of the CDF and prove the result in this context.
Jul 30, 2018 [Python] Sphinx Compatible Forwarding Patterns in Python
We develop a Python example that showcases the forwarding pattern while handling docstrings in a Sphinx compatible way. We maintain this compatibility in two ways: first, using a metaclass, and second, using decorators. Along the way, we discover a few things about binding instance and class attributes.
Dec 4, 2016 [Python] Pandas.DataFrame, PostgreSQL, and Autoincrementing Columns
Pandas.DataFrame has a to_sql() convenience method for pumping dataframes to SQL tables. However, there is no parameterization of to_sql() that will create an autoincrementing column index. This post details a simple workaround.
Oct 6, 2016 [C] Dynamic Programming 101: Change for a Dollar
We describe an efficient dynamic programming algorithm to compute the number of ways that one might make change for a dollar. The answer, assuming pennies, nickles, dimes, quarters, and half-dollars? Two hundred ninety two.
Oct 5, 2016 [Python] Random Access Priority Queue
We describe a priority queue data structure that allows item removal via key or removal via (lowest) priority. As a bonus, there is code that shows how to implement a wrapper for a decorator class that enables per-method parameters and has access to class variables.
Sep 26, 2016 [R] Adaptive Rejection Sampling
Adaptive rejection sampling is a statistical algorithm for generating samples from a univariate, log-concave density. Because of the adaptive nature of the algorithm, rejection rates are often very low.
Sep 19, 2016 Multivariate Normal: Conditional Density Derivation
We derive the classical result: what is the density of a multivariate normal conditioned on some proper subset of its components?
Nov 18, 2015 Deploying a New Rails App On a Subdomain
This post: a step-by-step recipe for deploying a new Rails app on a subdomain. Our Ubuntu server runs Apache / Passenger / Rails / Capistrano; our domain registrar is namecheap.
Oct 3, 2015 [R] Uber Interview Challenge
An analysis done as part of a recent Uber interview, this post showcases a regularized logistic regression model used to assess customer retention.
Oct 1, 2015 Rmarkdown, A Simple Example
This post is an example showing how to write a post in R markdown. In particular, there is an Rmd block that generates a figure and a second Rmd block that generates a nicely formatted table.
Oct 1, 2015 Knitr & Jekyll: A Stats Blog Pipeline
Combining Knitr and Jekyll took some effort. This post describes how to get it all working as part of a statistical blogging pipeline.
Aug 16, 2015 Namecheap, Dynamic IPs, ddclient, and Hosting Multiple Sites on a Single Server
How to run a single Ubuntu server with a dynamic IP address that hosts several different sites: this post will show you how using the Namecheap domain name registrar.