Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches from 1935 to 2014

  • Tue 21 Nov 17

    12:00 - 13:00

  • Colchester Campus


  • Event speaker

    Professor Slava Mikhaylov

  • Event type

    Lectures, talks and seminars
    IADS Seminar Series

  • Event organiser

    Institute for Analytics and Data Science (IADS)

  • Contact details

    Daniel Karapetyan

Professor Slava Mikhaylov, Institute for Analytics and Data Science / Department of Computer Science and Department of Government, Essex

Topic models are among the most widely used methods in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models are unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and often problematic because it depends on the domain expertise of the researcher and cardinality effects in human decision making. As a consequence, insights drawn from a topic model are difficult to replicate. We present an automatic transfer topic labeling method that seeks to remedy some of these problems. We take advantage of the fact that domain-specific codebooks exist in many areas of research that can be exploited for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the code manual of the Comparative Agendas Project to automatically label topics. We show that our method works well for a majority of the topics we estimate, but we also find institutional-specific topics, in particular on subnational governance, that require manual input. Our results indicate that fully unsupervised automatic topic labeling methods may not be currently feasible and we gain additional insights with the human-in-the-loop transfer learning approach. The method proposed in the paper can be easily extended to other areas with existing domain-specific knowledge bases, such as party manifestos, open-ended survey questions, social media data, and legal documents. 

Slava Mikhaylov is a Professor of Public Policy and Data Science at the University of Essex, holding a joint appointment in Department of Government and Computer Science Department Institute for Analytics and Data Science. He is a Chief Scientific Adviser to Essex County Council and a co-investigator in the UK Economic and Social Research Council Big Data infrastructure investment initiative - Consumer Data Research Centre at University College London. His research and teaching is primarily in the field of machine learning and natural language processing.

Related events