Dustin Lennon
statistics - machine learning - data science

Portfolio

Dustin Lennon
E:
P:

About

I am a classically trained statistician with a background in consulting and computational software development. My value derives from framing business questions in a mathematical context, carrying out appropriate analyses, and delivering interpretable results based on data.

Technical Skills

I have worked on projects using R, SQL, C/C++, bash: awk/grep/sed, Rails, Python, Java, and C#/SCOPE (Microsoft's version of MapReduce).

Work History

Senior Data Scientist, servicenow
December 2016 - October 2017

In December 2016, I joined the Operational Intelligence team where I focused on developing and testing algorithms for event correlation and anomaly detection in an ITOM environment.

I left because, despite promises to the contrary, we never had any data.

highlights

  • I developed a novel statistical model for detecting statistically significant co-occurring events. The main idea was to consider event pairs and then maximize a likelihood derived as the difference in waiting times from some latent precursor event. In simulated data, the model was able to discriminate between 'real' and 'by-chance' co-occurrences and did so remarkably well. In real data with no ground truth, the model was able to surface several statistically significant co-occurrences from hundreds of thousands of events, and these discoveries were deemed useful by human experts.
  • We were interested in detecting anomalous behavior in time-series data associated with large IT infrastructures, e.g. metrics like CPU, memory, disk, db query volume, etc. To address the breadth of observed behaviors in these metrics, I suggested a Kalman filter approach and developed a versatile, component-based, Python implementation that allowed for trend, seasonality, missing data, and multivariate extensions for joint analyses of time series. Once calibrated, the model could runs in an online fashion and admits a real-time anomaly detection score.
  • For a simple local level model, I developed a novel approach to discriminate between one-off spikes and level shifts. The main idea was to monitor the real-time anomaly score and, when it was above some pre-determined threshold, start a secondary 'null hypothesis' filter which consumed 'missing' data. When the original filter stabilized, the secondary filter gave a confidence interval against which the original filter level estimate could be compared.

Statistical Consultant, Inferentialist, LLC
May 2012 - Present

In May 2012, I started Inferentialist LLC, with the belief that the tech industry was missing opportunities to leverage statistical best practices.

White Papers and Analyses

  • How to Price a House: An Interpretable Bayesian Approach: I designed and implemented a Bayesian model that decomposes home costs into land and structure components. The approach used an EM algorithm to ensure interpretability of parameter estimates.
  • Sparse Functional Models: Predicting Crop Yields. I estimated the optimal annual precipitation pattern for Iowa soybeans using 14 years of historical yield data. This analysis used statistical techniques for analyzing areal units.
  • Optimal Lending Club Portfolios. I designed a custom, non-parametric, survival model for estimating the expected net present value of a loan. The model was used to construct optimal portfolios that significantly outperform grade-based strategies.
  • Probabilistic Performance Guarantees for Oversubscribed Resources. This work explores the consequences, bounded in probability, of relaxing resource guarantees under different usage patterns.
  • I analyzed volatility trends in King County residential home sale prices. The study concluded that monthly, quantile-based estimates have been inherently unstable since late 2007; statistical models will, necessarily, be required to enforce temporal continuity.
  • Vision Repaired: The Inferentialist Logo. I wrote a white paper about the computational geometry tools used to create the Inferentialist logo.

Consulting Engagements

  • Consulted with a Seattle-area startup, building statistical models to discover predictors of customer churn in a small-sample context.
  • Worked with a small, Orcas Island-based company to improve data import integrity across multiple data stream. The goal was to de-dup, or impute, financial transaction records for consistent, near-time reporting.

Projects

  • I built the Carpool Project app prototype for Android.
  • Map Tiles for Data Analysis. I fused publically available geographic and census data (TIGER/Line and ACS datasets) to create a webtool for visualizing spatial trends at county-level resolution.
  • Designed, built, and deployed the Inferentialist Blog. The site runs the Jekyll blog software and integrates R Markdown publishing tools.
  • Designed, built, and deployed the Inferentialist website. The site runs on a Rails web server.

Senior Data Scientist, Microsoft
October 2014 - October 2015

In October 2014, I was transitioned to a data reporting role when the O365 Customer Intelligence Team was re-organized under new management. By January, the CI team had effectively collapsed, and I found myself on the Bing Analysis and Experimentation Team. My new role was as an internal consultant, to provide analytics support to external partner teams, across the company, that had expressed an interest in onboarding to Bing's existing experimentation platform.

In my 18 months at Microsoft, I had seven different managers.

Bing Team

  • Fostered a culture of data-driven decision making across Microsoft. This included working with OneNote.com to run their first A/B experiment; and helping Exchange Online manage data quality issues inhibiting A/B experiments.
  • Designed and built a generic, extensible data summary library in SCOPE/C#. This library was widely adopted, and the Avocado Team internalized it as a core part of a web-facing, "deep-dive" toolset.
  • Worked with the Bing Experimentation Platform Team to generalize their existing, Bing-specific analytics toolchain for external partners.

O365 Customer Intelligence Team

  • Developed an interpretable forecast model for customer support tickets which informed budget allocations related to future staffing in customer support.
  • Internal White Paper: The Effect of Active Users on Support Tickets. We showed that increases in daily active usage correlated with increases in ticket volumes. Oddly, this was widely considered to be a surprising result, and the computed statistics were broadly circulated.
  • R Package: Simple Models for Support Ticket Volumes.. An R wrapper, and corresponding documentation, to support the computation of estimates derived in the Effect of Active Users on Support Tickets white paper.

Researcher, Microsoft
May 2014 - September 2014

In May 2014, I accepted a research position at Microsoft on the O365 Customer Intelligence Team. Our mandate was to develop machine learning tools that would detect trends in customer service tickets; the goal was to identify common customer complaints and, in an automated fashion, propose relevant solutions.

Highlights

  • I implemented a change point detection algorithm for support ticket volumes. This was often used, retroactively, to discover that apparent temporal shocks in usage were due to new instrumentation in recently deployed builds.
  • I identified classes of support tickets that were strongly correlated with early tenant lifecycle, i.e. onboarding issues.
  • Internal White Paper. Support Tickets: Confidence Intervals for Populations Estimates. This work introduced the idea of reporting both point estimates as well as confidence intervals. We illustrated two common situations: reconstructing population estimates after a data loss event; and a post-hoc assessment of a machine learning classifier.
  • Internal White Paper. Impactful Precursor Events: An Inhomogeneous Poisson Process Model.This work described an optimization framework for identifying high-likelihood, leading-indicator events related to support calls of a particular type.

Quantitative Modeler, Zillow
June 2011 - February 2012

I was hired to work on, and improve, the Zestimate algorithm. I argued against the existing, off-the-shelf machine-learning approach and in favor of building an interpretable model with spatial and temporal correlation structures.

Highlights

  • I developed a novel, cross validation, coefficient-of-variation statistic for better assessing which home predictions were likley to suffer from temporal instabilities.
  • I built a data-cleaning imputation engine that used copula methods and a KNN framework to automatically identify comparable homes ("comps"). The method was capable of leveraging both spatial and temporal information.
  • I designed and implemented a Hedonic Pricing Model. The work generalized a linear mixed-effects model to incorporate a between-group, auto-regressive correlation structure using an REML paradigm.
  • I invented a novel alternative to the ZHVI, Zillow's median home value index, based on longitudinal, repeat sales. The new algorithm provided significantly improved predictions of recent sale price given a previous sale price, particularly for small samples.
  • My first task was a post-hoc "cleanup" of several million home Zestimate histories that exhibited aberrant, non-physical "spikey" behavior; order of magnitude jumps.

Machine Learning Intern, Globys
October 2010 - June 2011

I joined a team at Globys that was tasked with improving upsell stratgies for mobile add-on packages. The initial goal was to derive predictors for upsell from existing, retrospective data. While I was there, I saw the strategy shift toward controlled experiments, the gold standard in assessing the efficacy of online marketing campaigns.

Highlights

  • I constructed a novel algorithm to boost lift for an event-based, contextual marketing engine. The approach used stratified sampling in concert with mutual information and stochastic model selection to significantly improve ROC curves of acceptance rates.
  • I used an Apriori algorithm to create new attributes from aggregated historical purchases having incraesed predictive power of marketing efficacy.
  • I prototyped a logistic regression code for stepwise model building that was ported into production.

Consultant, NumeriX
June 2009 - July 2010

I continued, part-time, to work for NumeriX while pursuing a PhD in Statistics at the University of Washington.

Senior Software Developer, NumeriX
June 2006 - August 2007

I continued a summer internship, set up by my academic advisor while on sabbatical, with NumeriX. I focused on two sides of a multi-factor SDE, derivatives pricing model: calibration to market prices and numerical pricing algorithms.

Highlights

  • I investigated a backward-lattice pricing algorithm for a general class of exotic derivatives to identify, and repair, numerical instabilities. This required reverse engineering non-trivial mathematics (multi-factor SDEs) on a relatively undocumented codebase.
  • I developed new iterative solvers for calibrating BGM Libor Market models to market data.
  • I implemented a PDE solver to price Asian and Lookback options with discrete observation dates.

Technical Staff, MIT Lincoln Labs
September 2001 - May 2002

I was a member of the Sensor Measurement and Analysis Group. My job was to implement backscatter models and tracking algorithms for RADAR applications.

Education

PhD Student, Statistics
Department of Statistics, University of Washington
Fall 2007 - Summer 2010

My PhD-level coursework was in statistical theory, optimization, stochastic modeling, and computing.

My advisors were Doug Martin, computational finance; Paul Tseng, semi-definite programming; Vladimir Minin, statistical genetics.

I received a Masters degree from the department in 2012.

Publications

Research

  • Research Collaborator. I worked with Adrian Dobra to develop code for investigating properties of large, sparse, categorical datasets.
  • Research Assistant. I worked with Vladimir Minin to develope a distributed, multicore C++ code for analyzing large, phylogenetic datasets. We used this code for inferring mutation rates of evolutionary traits.
  • Research Assistant. I worked with Marina Meila on an ill-posed gravitational inverse problem which we regularized using sparsity and compressed sensing techniques.
  • Research Assistant. I worked with Caren Marzban to implement an optical flow algorithm to be used for analyzing weather forecasts.

Teaching

  • Teaching Assistant. Introductory Statistics.

Computational Finance Certificate
University of Washington

The Computational Finance Certificate is an interdisciplinary program that requires several finance courses as well as a "capstone" project. In my case, I took courses in optimization, econometric theory, stochastic calculus, modern portfolio theory, and financial derivatives.

PhD Student, Applied Mathematics
Department of Applied Mathematics, University of Washington
Fall 2003 - Fall 2005

My PhD-level coursework was in numerical analysis for ordinary and partial differential equations with applications in computation fluid dynamics.

My advisor was Randall Leveque.

I received a Masters degree from the department in 2005.

Research

  • Research Assistant. I worked with Randy Leveque on finite volume methods for high-accuracy modeling of non-linear physical phenomena like shock waves.

Teaching

  • Grader. Numerical Linear Algebra.
  • Teaching Assistant. Business Precalculus.
  • Teaching Assistant. Calculus.

Projects

  • Black Scholes PDE Solver. I wrote Matlab numerical codes for solving the Black Scholes PDE for American put and European call options.
  • Fast Voronoi Decomposition. I wrote C++ code to compute fast Voronoi diagrams via Fortune's algorithm. The visualization toolbox used the OpenGL framework.
  • Simple Fluid Flows. I wrote Matlab numerical code for solving the steady state, incompressible, irrotational flow problem.

Computer Science
Princeton University
Fall 1997 - Spring 2001

My undergraduate coursework focused on theory and algorithms. I also completed the undergraduate certificate program in applied and computational mathematics.

My advisors were Bernard Chazelle and Brian Kernighan.

I received my Bachelor of Science in Engineering from the university in 2001, graduating magna cum laude.

Contact Me