Videos

Challenges posed by dependency structures in non-normal multivariate data from microbiota

Presenter
January 23, 2020
Abstract
Susan Holmes Stanford University Statistics Data from sequencing bacterial communities are formalized as contingency tables whose columns correspond to different biological sample-specimens. The row-features are a random collection with of Amplicon Sequence Variants (ASVs in the case of 16S rRNA type amplicon sequencing) or gene fragments (in the case of metagenomics). In both cases, these entities are defined after the data are collected, thus imposing a nonparametric framework. There are usually more features-rows than columns imposing necessary regularization through use of Bayesian priors. However the classical Dirichlet-multinomial models are insufficient to account for the strong associations (or exclusions) between certain bacteria, thus recent hierarchical models such as latent Dirichlet topic models have provided a more flexible framework that allow mixed membership models more appropriate for these non-Gaussian data. We will show how these hierarchical topic models can enhance our understanding of both longitudinal dependencies between samples and biological dependencies between taxa, regardless of the differences in sampling depth and sources of variability. This is talk contains joint work with Kris Sankaran, Pratheepa Jenganathan and David Relman's group at Stanford.
Supplementary Materials