[Cs-colloq] Tue. Feb 28, LWSN 3102 Sanvesh Srivastava (Purdue STAT)

Blakeslee, Erin eblakesl at purdue.edu
Fri Feb 24 11:14:37 EST 2012

Machine Learning and Applications Seminar
Sponsored by Yahoo! Research Labs

Tuesday, Feb 28th, 2012
12:00pm-13:00pm, LWSN 3102 A/B

Speaker: Sanvesh Srivastava, Department of Statistics, Purdue University

Title:  Latent Process Decomposition of High-Dimensional Count Data


We present a novel approach to probabilistically model high-dimensional count data in an unsupervised way using a three-level hierarchical Bayesian model. Its application is explored in the context of next-generation sequencing data for the purpose of identifying subsets of genes with consistent expression patterns, and that explain a large portion of variability. Each sample is modeled as a finite mixture of Poisson random variables over an underlying set of latent variables that are assumed to correspond to latent processes. Each latent process is further modeled as an infinite mixture over an underlying set of latent process probabilities. We call this model Latent Process Decomposition (LPD). It combines ideas from machine learning and resampling-based methods, and uses a computationally efficient variational method for parameter estimation. The performance of LPD is investigated in simulated data to demonstrate that it is a useful modular and extensible tool for identifying interesting genes for further exploration. LPD is implemented as an R/Bioconductor package called themes.

Jimmy Johns sandwiches will be served before the seminar. We expect to see you on Tuesday!

Yao Zhu (yaozhu at purdue.edu) and Zenglin Xu (xu218 at purdue.edu)

More information about the Cs-colloq mailing list