Event

Model-based clustering for high dimensional data: infinite mixtures of infinite factor analysers

  • Thu 6 Feb 20

    14:00 - 16:00

  • Colchester Campus

    STEM Centre 3.1

  • Event speaker

    Professor Claire Gormley

  • Event type

    Lectures, talks and seminars
    Mathematical Sciences Departmental Seminar

  • Event organiser

    Mathematical Sciences, Department of

  • Contact details

    Andrew Harrison

Mathematical Sciences Departmental Seminar

These Departmental Seminars are for everyone interested in Maths. We encourage anyone interested in the subject in general, or in the particular subject of the seminar, to come along. It's a great opportunity to meet people in the Maths Department and join in with our community. 

Refreshments are shared in the Department (STEM 5.1) after every seminar.

Model-based clustering for high dimensional data: infinite mixtures of infinite factor analysers

Professor Claire Gormley

Gaussian mixture models with a factor-analytic structure are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and the optimal pair selected using a model choice criterion. For computational reasons, models in which the number of latent factors is common across clusters are generally considered.

Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Poisson-Dirichlet process prior to facilitate automatic inference on the number of clusters. Further, IMIFA employs shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data.

Applications to benchmark and real data sets illustrate the IMIFA model and its advantageous features: IMIFA obviates the need for model selection criteria, reduces model search and associated computational burden, improves clustering performance by allowing cluster-specific numbers of factors, and quantifies uncertainty in the numbers of clusters and cluster-specific factors. Implementation of the proposed methodology is facilitated through the associated open source R package IMIFA.

Joint work with Keefe Murphy and Cinzia Viroli.

Speaker

Professor Claire Gormley is an Associate Professor of Statistics at the School of Mathematics and Statistics, University College Dublin.