3L Data Analysis and Statistical Graphics with R

Martin Elff, University of Constance
5 - 16 August (two week course / 35 hrs)

Detailed Course Outline [PDF]

Course Content

The module introduces to the analysis of multivariate social science data from a point of view that connects theoretical principles with practical applications. Consequently, the module does not limit itself to the theoretical presentation of concepts such as probability, expectation, regression, dimension reduction, statistical significance etc. Rather it puts much emphasis on that participants “road-test” such concepts with the help appropriate software. For this reason, the module makes use of the statistical software system R, which allows both to explore fundamental principles of probability and statistics and to conduct rigorous data suitable for research publications.

Topics covered in the module are:

  1. probability, random variables and expected values;
  2. random numbers and numerical data in R;
  3. managing and graphing multivariate data;
  4. linear regression – model construction and interpretation;
  5. linear algebra and the geometry of linear regression;
  6. hypothesis testing and statistical significance;
  7. generalised linear models for categorical responses, counts, and survival times;
  8. principal components and factor analysis;
  9. analysing distances: cluster analysis, multidimensional scaling and unfolding;
  10. non-linear and semi-parametric extensions of the (generalised) linear model;
  11. special and advanced topics of multivariate data analysis with R (latent variable models, multilevel models, or programming techniques – depending on participants' interests).

Course Objectives

Participants who successfully complete this module will have a solid understanding of the general principles of data analysis and of multivariate analysis in particular. They will also have an overview of the issues and main techniques of multivariate statistical analysis. While a two week course can hardly cover all in depth, successful participants will at least be able to identify which of these techniques are appropriate for their research. Further they will be able to graph their data and conduct their data analysis with the free statistical software system R.

Course Prerequisites

The module introduces to a variety of statistical concepts and techniques and therefore has only little prerequisites. In order to be able to follow the course of the topics participants should have a solid understanding of high-school mathematics and descriptive statistics. They should also have a certain level of “computer literacy”, that is, they should not be afraid of command-line oriented (as opposed to menu-driven) software and of writing short command scripts. The ability to do that is not pre-supposed, but the motivation to learn such things is.

Backround Reading

Bartholomew, David J., Fiona Steele, Irini Moustaki, and Jane I. Galbraith. 2008.Analysis of Multivariate Social Science Data. (2nd ed.) Boca Raton: Chapman&Hall/CRC.

Dalgaard, Peter. 2002. Introductory Statistics with R. New York: Springer.

Fox, John. 2008. Applied Regression Analysis, and General Linear Models. (2nd ed.).Thousand Oaks: Sage.

Fox, John 2002. An R and S-Plus Companion to Applied Regression. Thousand Oaks: Sage.

Gill, Jeff 2006. Essential Mathematics for Political and Social Research. Cambridge University Press.

Venables, W.N., and Ripley, B.D. 2002. Modern Applied Statistics with S. (4th ed.) New York: Springer.

[top of page]