Advanced Quantitative Data Analysis
Prof. Vera E. Troeger, University of Warwick
10 - 21 August (two week course / 35 hrs)
Prof. Vera E. Troeger Is professor of Quantitative Political Science and Political Economy in the department of economics at Warwick University. Between 2007 and 2011 she was Director of the Essex Summer School in Social Science Data Analysis. Most of her academic education is in international and comparative political economy, economics, econometrics, and applied statistics. She previously held positions at the University of Essex, the Max Planck Institute of Economics and the University of Exeter. She teaches and taught courses and workshops in quantitative methods and statistics for undergraduate, master and PhD students in Political Science and Economics. Vera is associate editor of one of the most highly ranked journals in political science – Political Analysis and serves on the editorial board of the European Journal of Political Research. She is also a council member of the newly founded European Political Science Association. Her research interests lie at the intersection of international and comparative political economy, econometrics, and applied statistics, in particular economic policy diffusion and spillovers of monetary and tax policy, quantitative political methodology, especially pooled cross-section time series analysis, the trade off between bias and efficiency in finite sample econometrics and endogeneity issues. She published papers on external effects of currency unions, monetary policy autonomy, international tax competition, time invariant and rarely changing variables in pooled data analysis, budgetary party politics, war and stock market reactions in the American Journal of Political Science, the European Journal for Political Research, the British Journal of Political Science, International Studies Quarterly, the Journal of Conflict Resolution, European Union Politics and Political Analysis.
- This course will cover various questions and specification issues in multivariate quantitative data analysis and is designed for students who already have training in basic statistics and knowledge of bivariate linear regression analysis. The course deals with different problems arising in applied data analysis when multiple violations of the basic regression assumptions occur. We will start by discussing the basic Gauss-Markov assumptions of OLS regression analysis, their violations and suitable solutions to such misspecifications, especially when they occur in conjunction. Thus, participants will learn how to deal with different types of heteroskedasticity, spatial correlation, serial correlation and dynamics as well as various kinds of heterogeneity. This discussion will include working with divers data such as cross-sectional, time-series, panel and pooled data. The course gives an overview of the problems arising from complex data structures and also provides techniques to control and account for specific complications. We will also look at problems arising from non-linear relationships, interactions effects and parameter instability. In addition, this course shows how to deal with specification problems such as complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues, missing data, spatial heterogeneity and dependency. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. Specifically, we will consider truncated and censored data as well as sample selection, instrumental variable approaches and seemingly unrelated as well as simultaneous equation models. The course combines a more theoretical introduction into different topics with practical analysis of diverse data sets using STATA. Students are encouraged to bring their own data sets and present their research projects and empirical analysis during the course.
- The course requires basic knowledge of inferential statistics, calculus and linear algebra and is designed to further develop the understanding of statistical problems arising from complex data generating processes in applied data analysis. The course mostly deals with questions of specification and model choice and is therefore a very practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The taught material should help participants to solve their own estimation problems and increase the reliability and efficiency of statistical results. The course is targeted at social and political scientists as well as economists with average statistical skills with a strong interest in applied empirical research and data analysis. The focus lies on practical problems of applied data analysis.
- The course requires average skills and knowledge in inferential statistics, including basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants are required to be familiar with STATA and its command structure. It would be beneficial if participants could write their own do- files and have some familiarity with STATA’s programming language. The course is designed to build on a good working knowledge of cross-section bivariate regression models and basic time-series models. Participants should be able to interpret regression coefficients, standard errors and significance tests.
- Wooldridge, Jeffrey M. 2003: Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge.
Representative Backround Reading / References
- Introduction to the Course: the OLS model, Gauss-Markov Assumptions and Violations
- Heteroskedasticity, cross-sectional correlation, multicollinearity, omitted variable bias: tests and common solutions
- Dynamics, serial correlation and dependence over time
- Heterogeneity: How to choose the right model – how good are the available tests.
- Slope and Parameter Heterogeneity: Seemingly Unrelated Regression and Random Coefficient Models, modelling parameter instability
- Specification Issues I: interaction effects and non-linearities
- Specification Issues II: Endogeneity and spatial econometrics: instrumental variable approaches and simultaneous equation models
- Panel Data and Pooled Cross-section time series data: models and specification issues
- Limited Dependent Variable Models I: Binary and Ordered Choice Models
- Limited Dependent Variable Models II: Models for Truncated, Censored and Selected data.