3N Advanced Quantitative Data Analysis

Vera E. Troeger, University of Warwick
5 -16 August (two week course / 35 hrs)

Detailed Course Outline [PDF]

Course Content

This course will cover various questions and specification issues in multivariate quantitative data analysis and is designed for students who already have training in basic statistics and knowledge of bivariate linear regression analysis. The course deals with different problems arising in applied data analysis when multiple violations of the basic regression assumptions occur. We will start by discussing the basic Gauss-Markov assumptions of OLS regression analysis, their violations and suitable solutions to such misspecifications, especially when they occur in conjunction. Thus, participants will learn how to deal with different types of heteroskedasticity, spatial correlation, serial correlation and dynamics as well as various kinds of heterogeneity. This discussion will include working with divers data such as cross-sectional, time-series, panel and pooled data. The course gives an overview of the problems arising from complex data structures and also provides techniques to control and account for specific complications. We will also look at problems arising from non-linear relationships, interactions effects and parameter instability. In addition, this course shows how to deal with specification problems such as complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues, missing data, spatial heterogeneity and dependency. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. Specifically, we will consider truncated and censored data as well as sample selection, instrumental variable approaches and seemingly unrelated as well as simultaneous equation models. The course combines a more theoretical introduction into different topics with practical analysis of diverse data sets using STATA. Students are encouraged to bring their own data sets and present their research projects and empirical analysis during the course.

Course Objectives

The course requires basic knowledge of inferential statistics, calculus and linear algebra and is designed to further develop the understanding of statistical problems arising from complex data generating processes in applied data analysis. The course mostly deals with questions of specification and model choice and is therefore a very practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The taught material should help participants to solve their own estimation problems and increase the reliability and efficiency of statistical results. The course is targeted at social and political scientists as well as economists with average statistical skills with a strong interest in applied empirical research and data analysis. The focus lies on practical problems of applied data analysis.

Course Prerequisites

The course requires average skills and knowledge in inferential statistics, including basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants are required to be familiar with STATA and its command structure. It would be beneficial if participants could write their own do- files and have some familiarity with STATA’s programming language. The course is designed to build on a good working knowledge of cross-section bivariate regression models and basic time-series models. Participants should be able to interpret regression coefficients, standard errors and significance tests.

Required Reading

Wooldridge, Jeffrey M. 2003: Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge.

Representative Background Reading

  • Introduction to the Course: the OLS model, Gauss-Markov Assumptions and Violations
  • Heteroskedasticity, cross-sectional correlation, multicollinearity, omitted variable bias: tests and common solutions
  • dynamics, serial correlation and dependence over time
  • Heterogeneity: How to choose the right model – how good are the available tests.
  • Slope and Parameter Heterogeneity: Seemingly Unrelated Regression and Random Coefficient Models, modelling parameter instability
  • Specification Issues I: interaction effects and non-linearities
  • Specification Issues II: Endogeneity and spatial econometrics: instrumental variable approaches and simultaneous equation models
  • Panel Data and Pooled Cross-section time series data: models and specification issues
  • Limited Dependent Variable Models I: Binary and Ordered Choice Models
  • Limited Dependent Variable Models II: Models for Truncated, Censored and Selected data.

[top of page]