I'm looking for...

Courses Research People Something else

Recently viewed courses MSc (by Dissertation) Operational Research BA English Language and Linguistics BA Modern Languages Translation, Interpreting and Cultural Mediation BA Modern Languages and English Language MSc (by Dissertation) Mathematics

Prefer to see our subject areas?

Browse courses by subject

Looking for funded postgraduate opportunities?

View doctoral training partnerships Browse postgraduate research opportunities

We are different and we are the same. #WeAreEssex

Looking for student or staff information?

Student Directory Staff Directory

Courses

Prefer to see our subject areas?

Browse courses by subject

Research

Looking for funded postgraduate opportunities?

View doctoral training partnerships Browse postgraduate research opportunities

People

We are different and we are the same. #WeAreEssex

Something else

Looking for student or staff information?

Student Directory Staff Directory

Event

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches from 1935 to 2014

Tue 21 Nov 17

12:00 - 13:00
Colchester Campus

NTC.3.04
Event speaker

Professor Slava Mikhaylov
Event type

Lectures, talks and seminars
IADS Seminar Series
Event organiser

Institute for Analytics and Data Science (IADS)
Contact details

Daniel Karapetyan dkarap@essex.ac.uk

Professor Slava Mikhaylov, Institute for Analytics and Data Science / Department of Computer Science and Department of Government, Essex

Abstract
Topic models are among the most widely used methods in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models are unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and often problematic because it depends on the domain expertise of the researcher and cardinality effects in human decision making. As a consequence, insights drawn from a topic model are difficult to replicate. We present an automatic transfer topic labeling method that seeks to remedy some of these problems. We take advantage of the fact that domain-specific codebooks exist in many areas of research that can be exploited for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the code manual of the Comparative Agendas Project to automatically label topics. We show that our method works well for a majority of the topics we estimate, but we also find institutional-specific topics, in particular on subnational governance, that require manual input. Our results indicate that fully unsupervised automatic topic labeling methods may not be currently feasible and we gain additional insights with the human-in-the-loop transfer learning approach. The method proposed in the paper can be easily extended to other areas with existing domain-specific knowledge bases, such as party manifestos, open-ended survey questions, social media data, and legal documents.

Bio
Slava Mikhaylov is a Professor of Public Policy and Data Science at the University of Essex, holding a joint appointment in Department of Government and Computer Science Department Institute for Analytics and Data Science. He is a Chief Scientific Adviser to Essex County Council and a co-investigator in the UK Economic and Social Research Council Big Data infrastructure investment initiative - Consumer Data Research Centre at University College London. His research and teaching is primarily in the field of machine learning and natural language processing.

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches from 1935 to 2014

Tue 21 Nov 17

Colchester Campus

Event speaker

Event type

Event organiser

Contact details