Personalized Point Processes: A Simple Bayesian Analysis
This post describes a homogeneous Poisson process using a Gamma conjugate prior that can be used to estimate a pooled, per-subject intensity given a collection of realizations.
A homogeneous Poisson process is the simplest way to describe events that arrive in time. Here, we are interested in a collection of realizations. An example is user transactions in a system. Over time, we expect each user to produce a sequence of transaction events, and we would like to characterize the rate of these events on a per-user basis. In particular, users with more data should expect a more personalized characterization. Statistically, this can be accomplished using a Bayesian framework.
Let \(i \in \mathcal{I}\) denote a particular user in the set of all users, and \(X_i = (X_{i1}, \dots, X_{i n_i})\), the transaction times of the \(i\)th user.
We use \(X_i\) to denote the random variable; \(x_i\), the data, a realization of the random variable.
We employ a parenthesized superscript to denote a quantity across all \(n\) users. Thus, \(X^{(n)} = (X_1, \dots, X_n)\) is the random variable describing the transaction times of all \(n\) users.
\(f_\theta(\cdot)\) and \(F_\theta(\cdot)\) are a probability density and probability distribution parameterized by \(\theta\). We may also write \(f(\cdot \vert \theta)\) to indicate that the density is conditioned on \(\theta\).
We use \(\lambda_\theta(\cdot)\) to denote the intensity function.
\(L_i(\theta_i; X_i)\) is the likelihood of \(\theta_i\) associated with the transaction times of the \(i\)th user. In our context, where we have realizations across multiple users, the subscript, \(i\), indicates the per-user model parameterization.
\(T_i\) is the data collection period associated with the \(i\)th user.
For a Poisson process, where the multiple realizations are independent, we have the following results:
\[ L_i(\theta_i; X_i) = \exp \left[ -\int_0^{T_i} \lambda_{\theta_i}(u) du \right] \prod_{j=1}^{n_i} \lambda_{\theta_i}(x_{ij}) \]
\[ L(\theta^{(n)}; X^{(n)}) = \prod_{i=1}^n L_i(\theta_i; X_i) \]
\[ \pi_{\gamma} \left( \theta^{(n)} \vert X^{(n)} \right) \propto \prod_{i=1}^n L_i(\theta_i; X_i) \pi_{\gamma}(\theta_i) \]
where the \(\gamma\) subscript denotes hyperparameters which are shared across all users.
Without loss of generality, suppose we are interested in the parameters associated with the first user, i.e., \(\theta_1\).
\[ \begin{align*} \pi_{\gamma} \left( \theta_1 \vert X^{(n)} \right) & = \int \pi_{\gamma} \left( \theta^{(n)} \vert X^{(n)} \right) d \theta_2 \dots d \theta_n \\ & \propto L_1(\theta_1; X_1) \pi_{\gamma}(\theta_1) \end{align*} \]
The marginal likelihood describes the relationship between the data and the hyperparameters, \(\gamma\), after integrating out the user-level parameters. This is useful for an empirical Bayes approach as well as assessing model fit.
\[ \begin{align*} f_{\gamma}\left( X^{(n)} \right) & = \int f\left( X^{(n)} \vert \theta^{(n)} \right) \pi_{\gamma} \left( \theta^{(n)} \right) d \theta^{(n)} \\ & = \prod_{i=1}^n \int L_i(\theta_i; X_i) \pi_{\gamma}(\theta_i) d \theta_i \end{align*} \]
A homogeneous Poisson process is characterized by a constant intensity function. Hence, \(\lambda_{\theta_i}(t) = \lambda_i\); and \(\theta_i = \left\{ \lambda_i \right\}\).
\[ L_i(\theta_i; X_i) = \exp \left( -\lambda_i T_i + n_i \log \lambda_i \right) \]
A gamma distribution is a compatible conjugate prior given the functional form of the likelihood.
\[ \begin{gather*} \lambda_i \sim \Gamma(\alpha, \beta) \\ \pi(\lambda_i) = \exp \left( -\beta \lambda_i + (\alpha-1) \log \lambda_i \right) c(\alpha,\beta) \end{gather*} \]
\[ \pi(\lambda_i \vert X_i) \propto \exp \left( -(\beta + T_i) \lambda_i + (n_i + \alpha-1) \log \lambda_i \right) c(\alpha,\beta) \]
Equivalently,
\[ \left[ \lambda_i \vert X_i \right] \sim \Gamma(n_i + \alpha, T_i + \beta) \]
\[ \begin{align*} f\left( X^{(n)} \vert \alpha, \beta \right) & = \int \prod_{i=1}^n \exp \left( -(\beta + T_i) \lambda_i + (n_i + \alpha-1) \log \lambda_i \right) \, c(\alpha,\beta) \, d \lambda^{(n)} \\ & = \prod_{i=1}^n \left[ \frac{ \beta^{\alpha} }{ \Gamma(\alpha) } \frac{ \Gamma(n_i + \alpha) }{ (\beta + T_i)^{n_i + \alpha} } \right] \end{align*} \]
The log marginal likelihood is:
\[ \log f\left( X^{(n)} \vert \alpha, \beta \right) = \sum_{i=1}^n \left[ \alpha \log \beta + \log \Gamma(n_i + \alpha) - \log \Gamma(\alpha) - (n_i + \alpha) \log (\beta + T_i) \right] \]
and the derivatives:
\[ \begin{align*} \frac{ \partial \log f }{ \partial \alpha } & = \sum_{i=1}^n \left[ \log \beta + \psi(n_i + \alpha) - \psi(\alpha) - \log(\beta + T_i) \right] \\ \frac{ \partial \log f }{ \partial \beta } & = \sum_{i=1}^n \left[ \frac{ \alpha }{ \beta } - \frac{ n_i + \alpha }{ \beta + T_i } \right] \end{align*} \]
where \(\psi\) is the digamma function.