About

This blog is intended to document my day to day experiences as a statistical consultant. Basically, if I have to spend more than a few minutes thinking through a statistical argument, I'd like to leave myself a record of my thought process.

As such, the subject matter here could really be anything: theory, applications, modeling, computation, visualization, data mining, Bayesian analysis, experimentation, big data, machine learning.

Posts

  • Probability Integral Transform, A Proof

    The probability integral transform is a fundamental concept in statistics that connects the cumulative distribution function, the quantile function, and the uniform distribution. We motivate the need for a generalized inverse of the CDF and prove the result in this context.

  • [Python] Sphinx Compatible Forwarding Patterns in Python

    We develop a Python example that showcases the forwarding pattern while handling docstrings in a Sphinx compatible way. We maintain this compatibility in two ways: first, using a metaclass, and second, using decorators. Along the way, we discover a few things about binding instance and class attributes.

  • [Python] Pandas.DataFrame, PostgreSQL, and Autoincrementing Columns

    Pandas.DataFrame has a to_sql() convenience method for pumping dataframes to SQL tables. However, there is no parameterization of to_sql() that will create an autoincrementing column index. This post details a simple workaround.

  • [C] Dynamic Programming 101: Change for a Dollar

    We describe an efficient dynamic programming algorithm to compute the number of ways that one might make change for a dollar. The answer, assuming pennies, nickles, dimes, quarters, and half-dollars? Two hundred ninety two.

  • [Python] Random Access Priority Queue

    We describe a priority queue data structure that allows item removal via key or removal via (lowest) priority. As a bonus, there is code that shows how to implement a wrapper for a decorator class that enables per-method parameters and has access to class variables.

  • [R] Adaptive Rejection Sampling

    Adaptive rejection sampling is a statistical algorithm for generating samples from a univariate, log-concave density. Because of the adaptive nature of the algorithm, rejection rates are often very low.

  • Multivariate Normal: Conditional Density Derivation

    We derive the classical result: what is the density of a multivariate normal conditioned on some proper subset of its components?

  • Deploying a New Rails App On a Subdomain

    This post: a step-by-step recipe for deploying a new Rails app on a subdomain. Our Ubuntu server runs Apache / Passenger / Rails / Capistrano; our domain registrar is namecheap.

  • [R] Uber Interview Challenge

    An analysis done as part of a recent Uber interview, this post showcases a regularized logistic regression model used to assess customer retention.

  • Rmarkdown, A Simple Example

    This post is an example showing how to write a post in R markdown. In particular, there is an Rmd block that generates a figure and a second Rmd block that generates a nicely formatted table.

  • Knitr & Jekyll: A Stats Blog Pipeline

    Combining Knitr and Jekyll took some effort. This post describes how to get it all working as part of a statistical blogging pipeline.

  • Namecheap, Dynamic IPs, ddclient, and Hosting Multiple Sites on a Single Server

    How to run a single Ubuntu server with a dynamic IP address that hosts several different sites: this post will show you how using the Namecheap domain name registrar.

subscribe via RSS