Dustin Lennon

Dustin Lennon

Applied Scientist

2648A NW 57th St
Seattle, WA 98107
(206) 291-8893

Background


About

About

Implement code, analyze data, communicate insights, repeat: the job of an Applied Scientist is about as cross functional as it gets. In my experience, this has meant delivering interpretable statistical analyses, building reliable ETL pipelines, and, on occasion, even devloping new algorithms; whatever it takes to pragmatically make and support well-informed business decisions.

Highlights
Previously, I have
  • authored a Python package implementing tracking algorithms for time series,
  • designed algorithms for anomaly detection and event correlation in ITOM data,
  • increased partner team engagement of Bing's large scale A/B testing framework,
  • discovered impactful trends in O365 customer support data,
  • built and improved home valuation models in real estate markets, and
  • fixed numerically unstable financial software for pricing exotic derivatives.

Below, please enjoy my extended form resume. If you'd prefer a more curated version applicable to your particular scenario, please reach out. My email is dustin.lennon@gmail.com.

Projects

Projects

  • Published Pages, Publishing Platform

    https://dlennon.org/pages

    Jul, 2019

    I designed the publishing platform to host this resume and my portfolio. The look and feel was inspired by the jsonresume.org 'elegant' theme. The project depends heavily on the Python Klein library, Jinja2 templating engine, and pandoc to transform Markdown into HTML. The platform also permits easy inclusion of external, templated HTML; page-specific JavaScript; and registered python plugins associated with ajax endpoints on the server.

  • Charity Search, Web Application

    Jul, 2019

    I imported non-profit IRS tax returns into ElasticSearch and built a website to search for local charities.

  • Multivariate Kalman Filter, Python Package

    Apr, 2019

    I developed a multivariate Kalman filter code for non-stationary time-series analysis.

    • Full multivariate model enabling fast, online analysis
    • Native support for level, trend, and seasonality components
    • Non-informative priors for state space initial conditions
    • Automatically handles missing data
    • Support for modeling intervention effects
  • Twittalytics, Web Application

    http://twittalytics.com

    Aug, 2018

    The app monitors the Twitter stream and maintains a dynamic list of trending hashtags; and, for each hashtag, a random sample of relevant tweets.

  • Carpool Project, Android Application

    http://carpoolproject.org

    Feb, 2013

    I prototyped a location tracking app to collect daily commute data with the intention of helping people create viable carpools.

  • Interactive Spatial Heatmaps, Web Application

    Dec, 2013

    I built a website to visualize fused geographic (TIGER/Line) and census (ACS,BLS) datasets. This featured an interactive heatmap that reported spatial statistics aggregated at a metro (MSA) level.

Work Experience

Work Experience

  • ConsultantInferentialist LLC

    May, 2012 - Present

    My consulting company. I provide statistical expertise to clients.

    • Promote, design, and deliver controlled experiments (A/B tests) where possible.

    • Marketing analyses including engagement ladders.

    • Customer retention models and churn analysis.

    • Data integrity, data consistency, and data quality assessments.

    • Founded the company: developed the branding, built the website, contributed to the blog.

  • Senior Data ScientistServiceNow

    Dec, 2016 - Oct, 2017

    I developed, implemented, and tested statistical algorithms for the Operational Intelligence team.

    • Developed a framework for simulating correlated events from (randomly generated) branching processes that included (randomly) censored event data.

    • Developed a novel event correlation algorithm which recovered 98 percent of correlated event pairs in the simulated datasets and generated new insights in unlabeled customer datasets.

    • Implemented a bivariate Kalman Filter with level, trend, and seasonality components and which allowed for missing data; combined this with a novel, 'split-hypothesis' paradigm to detect anomalous jumps in the state space.

  • Researcher / Senior Data ScientistMicrosoft

    May, 2014 - Oct, 2015

    I provided statistical support for the O365 Customer Intelligence and Analysis and Bing Experimentation Teams.

    • Extended existing CLI software, and incorporated the new tooling into a web service. This allowed external partner teams to preview the ExP platform while waiting for a more formal migration.

    • Designed and built a generic, extensible data summary library in SCOPE/C#. The library was widely adopted, and the Avocado Team internalized it as a core part of a web-facing, 'deep-dive' toolset.

    • Worked cross-functionally with PMs and developers on the OneNote and Exchange Online teams to enable A/B experimentation on the ExP platform. Designed first-run experiments for these teams.

    • Implemented a change point detection algorithm for support ticket volumes which was used to identify unannounced deployments of new instrumentation. Corroborating these identifications allowed us to improve the documentation of, and communication around, the deployment process. It also allowed us to better clean the data, drastically reducing the variability of statistical estimates.

    • Identified classes of support tickets that were strongly correlated with early tenant lifecycle / onboarding issues.

    • Developed an interpretable forecast model for customer support tickets which informed budget allocations related to future staffing requirements.

  • Quantitative ModelerZillow

    Jun, 2011 - Feb, 2012

    I worked on fixes for the Zestimate algorithm.

    • Advocated for interpretable home valuation models that incorporated spatial and temporal structures (in contrast to off-the-shelf random forests).

    • Developed a cross-validated, coefficient-of-variation metric to assess the risk of temporal instability in a home's Zestimate history. This indicated that Zestimates with non-physical behavior were far more prevalent than previously thought.

    • Developed an alternative to the ZHVI--Zillow's proprietary home value index--based on estimating discount curves from longitudinal, repeat sales. This provided an improvement in the estimator for small samples.

    • Identified, and removed (post-hoc), 'spikey' Zestimate behavior in a collection of 100 million Zestimate histories. This resulted in 'corrections' to nearly 4 million time-series.

  • Machine Learning InternGlobys

    Oct, 2010 - Jun, 2011

    I built statistical models for up- and cross-sell marketing opportunities for mobile add-on packages.

    • I used an Apriori algorithm to create new features from historical purchases. These attributes had higher predictive power and produced significant lift in our models.

    • "I provided statistical support for implementing a step-wise, logistic regression model in production.

  • ConsultantNumerix

    Jun, 2009 - Aug, 2010

    I provided expertise on numerical stability issues arising in the multi-factor backward lattice algorithm.

  • Senior Software DeveloperNumerix

    Jun, 2006 - Aug, 2007

    I worked on numerical codes for pricing exotic financial derivatives.

    • Reverse engineered a multi-factor, backward-lattice pricing algorithm to diagnose and fix numerical instabilities.

    • Wrote new solvers for calibrating BGM Libor interest-rate models to market data.

    • Implemented a PDE solver to price Asian and Lookback options with discrete observation dates.

  • Technical StaffMIT Lincoln Laboratory

    Sep, 2001 - May, 2002

    Sensor Measurement and Analysis Team

    • Implemented backscatter models and tracking algorithms for RADAR applications.

Skills

Skills

  • Data Science (Venti)
    Interpretable ModelsTime SeriesVisualization
  • Data Science (Grande)
    Controlled Experiments (A/B Tests)Feature EngineeringData CleaningETL Pipelines
  • Machine Learning (Grande)
    Predictive Models
  • Mathematics (Tall)
    Linear AlgebraOptimizationNumerical Analysis
  • Software Development (Grande)
    Python
  • Software Development (Tall)
    RC/C++bash:awk/grep/sedPostgreSQL
  • Software Development (Short)
    JavaMapReduce
Education

Education

  • Statistics, M.S., University of Washington

    Jan, 2006 - Jun, 2010

    GPA 3.81/4.00
    STAT504 - Applied RegressionSTAT492 - Stochastic CalculusSTAT516/517 - Stochastic ModelingSTAT581/582/583 - Advanced Theory of Statistical InferenceMATH516 - Numerical OptimizationMATH530 - Convex AnalysisMATH582 - Convex Optimization AlgorithmsSTAT570 - Introduction to Linear ModelsSTAT599 - Statistical ConsultingBIOST579 - Data Analysis
  • Applied Mathematics, M.S., University of Washington

    Sep, 2003 - Dec, 2005

    GPA 3.82/4.00
    AMATH584 - Applied Linear AlgebraAMATH515 - Fundamentals of OptimizationAMATH585/586 - Boundary Value and Time Dependent ProblemsMATH554 - Linear AnalysisEE520 - Spectral Analysis of Time SeriesSTAT538 - Statistical ComputingSTAT530 - Wavelets
  • Computer Science, B.S.E., Princeton University

    Sep, 1997 - May, 2001

    graduated magna cum laude
    COS341 - Discrete MathematicsCOS423 - Theory of AlgorithmsCOS451 - Computational GeometryCOS426 - Computer GraphicsCOS333 - Advanced Programming TechniquesCOS425 - Database SystemsCOS318 - Operating SystemsCOS471 - Computer Architecture and OrganizationELE301 - Circuits and Signal ProcessingELE482 - Digital Signal Processing
Publications

Publications

  • The Effect of Active Users on Support Tickets, Microsoft Internal

    Published on: Oct 01, 2014

    This work presents a simple statistical analysis characterizing the relation between the number of active / engaged users in the system and the rate at which service request tickets are created.

  • Support Tickets: Confidence Intervals for Population Estimates, Microsoft Internal

    Published on: Aug 01, 2014

    This work showcases two simple models that allow for the construction of confidence intervals for population estimates associated with customer support tickets. Why is this important? Because it allows us to separate natural variation in business metrics from abnormal behavior that would warrant further investigation. What did we do? We built a model for data loss and a model for label misclassification. These models are used to assess how these two distinct sources of variation affect population estimates such as total minutes spent in customer service.

  • Probabilistic Performance Guarantees for Oversubscribed Resources , Inferentialist

    Published on: Nov 01, 2013

    The paper examines the risk associated with resource allocation in the case of oversubscription. We formulate the problem in a mathematical context and provide a business level parameterization that bounds, in probability, the rate of resource exhaustion. To validate the procedure, we run a simulation over different resource consumption scenarios. In the traditional case, we obtain a 26.4% usage rate; hence, 73.6% of our resource pool goes unused. Using the strategy described in the paper, we can guarantee, with 95% confidence, that resources will be available 99% of the time. This relaxation provides a 2.5x increase in utilization, and the median usage rate jumps to 66.7%.

  • Optimal Lending Club Portfolios , Inferentialist

    Published on: Oct 01, 2013

    This paper extends the concept of an actively managed, Lending Club portfolio. It introduces a novel, random forest type algorithm that treats portfolio assets in a survival context. Using historical data provided by the company, we use our algorithm to constructing optimial portfolios of Lending Club loans. Our results, driven by expected returns, compare favorably to investment strategies based solely on the loan grade assigned by Lending Club. Our optimal, actively managed portfolios have an expected return exceeding 12% annually. In contrast, portfolios constructed on A-grade loans return 6.68%; B-grade loans, 7.49%; and C-grade loans, 8.11%.

  • Measuring Microsatellite Conservation in Mammalian Evolution with a Phylogenetic Birth-Death Model , Genome Biology and Evolution

    Published on: May 16, 2012

    Microsatellites make up about three percent of the human genome, and there is increasing evidence that some microsatellites can have important functions and can be conserved by selection. To investigate this conservation, we performed a genome-wide analysis of human microsatellites and measured their conservation using a binary character birth-death model on a mammalian phylogeny. Using a maximum likelihood method to estimate birth and death rates for different types of microsatellites, we show that the rates at which microsatellites are gained and lost in mammals depend on their sequence composition, length, and position in the genome. Additionally, we use a mixture model to account for unequal death rates among microsatellites across the human genome. We use this model to assign a probability-based conservation score to each microsatellite. We found that microsatellites near the transcription start sites of genes are often highly conserved, and that distance from a microsatellite to the nearest transcription start site is a good predictor of the microsatellite conservation score. An analysis of gene ontology terms for genes that contain microsatellites near their transcription start site reveals that regulatory enes involved in growth and development are highly enriched with conserved microsatellites.

Interests

Interests

  • At Work

    interpretable data sciencemachine learningconvex optimizationrandomized algorithmsstatistical computingclean and efficient, intuitive design
  • Away From Work

    familyrock climbingmountain bikingroad cyclinghiking