3F Advanced Programming in Stata

Fred Boehmke, University of Iowa
5 - 16 August (two week course / 35 hrs)

Detailed Course Outline [PDF]

Course Content

This class will stress computing techniques as they relate to data management, analysis, and presentation. Rather than learning new estimators, we will focus on improving your ability to use statistical software – Stata, in particular, to generate, understand, and convey statistical results. The majority of class time will therefore be devoted to interactive computer lab sessions, with prepared examples and specific tasks to implement. Occasional, brief lectures will provide the necessary background. This course will cover three broad topics. First, we will learn how to enter, manipulate, and manage data in Stata, including using loops; collapsing, merging, and reshaping data; working with dates and string functions; and writing your own Stata programs and commands for repeated tasks. Second, we will learn how to use Monte Carlo to understand basic statistical estimators (e.g., regression, discrete, and multiple outcome models). Monte Carlo allows users to learn about estimators by understanding and controlling the data generating process and then learning about the consequences of violating key assumptions. Monte Carlo is also helpful for generating and interpreting model results through simulation of quantities of interest. Third, we will learn about various techniques for graphical interpretation, presentation, and mapping of data and results from various estimators, including predicated values, first differences, marginal effects as well as the presentation of associated levels of uncertainty.

Course Objectives

Understanding how to estimate a model is only a small part of the research process. This course will teach students how to be more efficient and effective in gathering, compiling, analyzing, and interpreting data and results. Stata commands and general techniques applicable across software will be taught to save you time and expand your capabilities. As estimators become ever more complex, interpreting and presenting results becomes more difficult. Students will learn how to create simple and effective tables and graphical interpretations of their results in order to highlight the primary finding and communicate its substantive interpretation to researchers, practitioners, and the general public. These techniques will be useful for researchers who do quantitative analysis and want to develop a deeper understanding of their models and their results.

Course Prerequisites

I will assume that you are familiar with a variety of statistical estimators, including linear regression, standard discrete choice models, duration analysis and event count models. I will also assume that you have a basic familiarity with Stata: that you know how to open a data set, write a batch (.do) file, summarize and tabulate data, generate variables and replace their values, as well as other basic commands. Familiarity with computers in general will also be helpful (e.g., directory structures).

Reading

There are no required texts for this course. Depending on your statistical background and familiarity with Stata, any or all of the recommended texts may be helpful.

Franzese, Robert J. and Cindy Kam. 2007. Modelling and Interpreting Interactive Hypotheses in Regression Analysis. Ann Arbor: University of Michigan Press.

Long, J. Scott and Jeremy Freese. 2005. Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition. College Station, Texas: Stata Press.

Gould, William and William Sribney. 2005. Maximum Likelihood Estimation With Stata, 3rd Ed. College Station, TX: Stata Press.

Mitchell, Michael N. 2008. A Visual Guide to Stata Graphics, 2nd Edition. College Station, TX: Stata Press.

Mooney, Christopher. 1997. Monte Carlo Simulation. Thousand Oaks, CA: Sage Publications.

[top of page]