Population dynamics with TensorFlow Probability

I recently read this Stan case study by Bob Carpenter and wanted to try my hand at doing something similar with TensorFlow Probability.

A crash course in ordinary differential equations

Let's say we want to model the population x of a species over time. A very basic model might say that the population grows exponentially—that is, at any point in time t, the instantaneous rate of change of x is proportional to the current population x(t). In the language of ordinary differential equations (ODEs), we would write this model as

d x over d t equals k times x,

where k is some constant of proportionality. If k > 0, the population grows without bound; if k < 0, the population shrinks and approaches 0; and if k = 0, the population stays constant. Note that even though x is a function, it's conventional to write just x instead of x(t) when stating the ODE.

Graph of exponential growth
Exponential growth at various proportionality constants, all starting at x(0) = 10

We can attempt to characterize the set of all functions that satisfy the above differential equation. After we do a little math, it turns out that it is exactly the set of all functions x which can be written in the form

c times e to the quantity (k t),

where c is some constant. It's not always the case that we can represent the solution set of an ODE so simply as a family of elementary functions. Now, suppose at some time t0, a particular function x which solves the ODE attains the value x(t0) = x0. Then

and replacing c with this value in the general solution gives

That is, any function solving the above ODE is uniquely defined by a point it goes through. For instance, in the above figure, the topmost curve (where k = 0.05) attains the value 10 at t = 0.

ODE Example
Solutions to above ODE with k = 0.05, starting at various initial conditions. The topmost function in this figure is the same as the topmost one in the previous figure.

To reiterate: we decide to model a population as growing exponentially, i.e. growing at each instant at some rate proportional to the population at that instant; we provide a value k for the proportionality constant; we provide, for some time t0, a value x(t0) = x0 for the population at that time; we then get a complete representation of the population as a function of time.

The Lotka–Volterra equations

Graph of a Lotka-Volterra system
The solution to a Lotka–Volterra system (with parameters and initial conditions provided below)

The Lotka–Volterra equations are a system of ordinary differential equations (ODEs) which can be used to model the time-varying population sizes of two species, a predator and its prey. Given four parameters α, β, γ, δ (explained shortly), the system is

equation 1: d x over d t equals x times the quantity (alpha minus beta y); and equation 2: d y over d t equals negative y times the quantity (gamma minus delta x),

where t is time, x is the number of prey, and y is the number of predators. The population sizes x and y are functions of time, but we follow convention and write just x and y in the above equations instead of x(t) and y(t). The parameters previously mentioned are

The parameter α is the prey's "intrinsic exponential growth rate" because in the absence of any interaction between the two species (i.e., if β and δ were equal to 0), the prey population size x would increase exponentially at a rate α. Likewise, an absense of interaction would cause y to decrease exponentially at a rate δ.

Note that β is multiplied by the predator population size y, and δ by the prey population size x. To fully define any specific solution, we also need initial population sizes x(t0) and y(t0) of the two species at some time t0. The above figure shows the solution to a Lotka–Volterra system with parameters α = 1.1, β = 0.4, γ = 0.4, δ = 0.1, and initial conditions x(0) = y(0) = 10.

Bayesian inference with Lotka–Volterra

Graph of hare and lynx data
  • Upper graph: hare pelts produced vs. lynx pelts produced in North Central district;
  • Lower graph: same, but lynx pelts from James Bay district.
(Note differing scales of the hare and lynx numbers.)

Given the four parameters and two initial conditions, the Lotka–Volterra model gives predator and prey population sizes at all points in time. But we are interested in the inverse problem: given time series data of the two populations, and assuming the Lotka–Volterra model is the true data generating process, what are the values of the parameters and initial conditions? That's where Bayesian inference comes in.

The code

Some inline code: asdf. Some block code:

int main() {
  return 0;
}

Nice code!

Bayesian inference with stochastic Lotka–Volterra

We previously assumed stochasticity in the measurement of population sizes. But it seems reasonable to also assume stochasticity in the population sizes themselves. That is, instead of assuming the population dynamics follow a system of deterministic equations, what if we allowed for some randomness in the population sizes? The long-term trend would match that of the deterministic model, but in the short-term, a population may for instance jump down slightly due to a hunting party, or jump up slightly during a bout of good weather (to be honest, a random population decrease seems more likely than a similarly sized population increase, but for simplicity we can assume they are of equal probability).

For this endeavor we would use a pair of stochastic differential equations.

I'm not going to get into a deep discussion of Bayesian inference with stochastic processes.

A more complex model

Nuances of the data

MacLulich (1957) notes that in 1863, 1872, and 1883, some hare skin shipments were one year late and thus counted in the following year's shipments, leading to large jumps in the reported numbers. From what I found, many resources don't mention this feature of the data, even though it seems pretty important!

Finerty (1979) notes that "the comparison is essentially between hares from eastern Canada and lynx from western Canada." To that end, Finerty (1971) suggests that the period length of the hare cycle in James Bay may be greater than that of the lynx cycle in the Mackenzie River District, which seems present in the data.