1H Computer-aided Text Analysis
Kenneth Benoit, London School of Economics and Political Science
9 - 20 July (two week course / 35 hrs)
Detailed Course Outline [PDF]
Course Content
The course is intended to survey and characterize methods for systematically extracting information from text for social scientific purposes, starting with classical content analysis methods and proceeding forward to state of the art scaling methods for estimating quantities from text using statistical methods. The course lays a theoretical foundation for text analysis but mainly takes a very practical and applied approach, so that students learn how to apply these methods in actual research. It takes as a starting point more traditional methods of content analysis but is aimed at the most recent advances in quantitative content analysis that treat words as data to be analysed using statistical tools. The course surveys several of these methods but also applies the statistical framework to more traditional non-automated coding schemes such as the Comparative Manifesto Project and the Policy Agendas Project.
Course Objectives
The course is also designed to cover many fundamental issues such as inter-coder agreement, reliability, validation, accuracy, and precision. Lessons will consist of a mixture of theoretical grounding in content analysis approaches and techniques, with hands on analysis of real texts using content analytic and statistical software.
Course Prerequisites
Ideally, students in this course will have prior knowledge in the following areas:
- A basic understanding of probability and statistics at the level of an introductory postgraduate social science course. Understanding of regression analysis is presumed;
- Familiarity with a statistical package such as Stata or (ideally) R. At a pinch, a spreadsheet could be used but a statistical package is greatly preferred, and instructional examples will use Stata and R.
- The ability to learn to use text analysis software (on a demonstration basis) such as Wordstat or MaxQDA. Neither of these are required, and no prior knowledge is assumed, but students should be willing and able to try them out. As these are both user-friendly packages and both are available in limited demonstration versions that can be freely downloaded, this should not be too difficult.
- The ability to manipulate text files using a text editor. (It does not matter which text editor you use.)
Representative Background Reading
The staple readings (as books) for this course will be Neuendorf (2002) and Krippendorff (2004). Where possible all other readings will be downloadable as pdfs from the course web pages.
