# The Dirichlet-Multinomial in PyMC3

### Modeling Overdispersion in Compositional Count Data

*Having just spent a few too many hours
working on the Dirichlet-multinomial distribution
in PyMC3,
I thought I'd convert the demo notebook
I also contributed
into a blog post.*

This example (exported and minimally edited from a Jupyter Notebook) demonstrates the use of a Dirichlet mixture of multinomials (a.k.a Dirichlet-multinomial or DM) to model categorical count data. Models like this one are important in a variety of areas, including natural language processing, ecology, bioinformatics, and more.

The Dirichlet-multinomial can be understood as draws from a Multinomial distribution where each sample has a slightly different probability vector, which is itself drawn from a common Dirichlet distribution. This contrasts with the Multinomial distribution, which assumes that all observations arise from a single fixed probability vector. This enables the Dirichlet-multinomial to accommodate more variable (a.k.a, over-dispersed) count data than the Multinomial.

Other examples of over-dispersed count distributions are the …