MA335-7-SU-CO:
Modelling experimental and observational data

The details
2020/21
Mathematics, Statistics and Actuarial Science (School of)
Colchester Campus
Summer
Postgraduate: Level 7
Current
Monday 26 April 2021
Friday 02 July 2021
15
21 April 2021

 

Requisites for this module
(none)
(none)
(none)
(none)

 

(none)

Key module for

MSC G305JS Applied Data Science

Module description

This module will introduce the principles for the application of linear modelling methodologies for the analysis of experimental and observational data. The first strand of the module will study the assumptions of the general linear model. Collinearity, influential data, assessing the fitted model and model selection techniques will be discussed. The second strand will introduce statistical methods for the efficient analysis of experiments when the data are normally distributed, for example one-way ANOVA. The methodology will be extended to logistic regression and the analysis of contingency tables when the variable of interest is categorical. The third strand of the module will study various multivariate methods for the analysis of large and high-dimensional data sets.

Module aims

The module will focus on providing the
Fundamental understanding of the underlying statistical methodologies;
Capabilities of applying these methodologies to real experimental and observational data
Knowledge of conducting a robust statistical analysing of the data
Capabilities of interpreting the results effectively.

Module learning outcomes

A. A comprehensive knowledge and familiarity to estimate a general linear model using real data;
B. A comprehensive knowledge and familiarity to assess fitted models and validate linear models’ assumptions;
C. A comprehensive knowledge and familiarity to identify and conduct simple designed experiments;
D. A comprehensive knowledge and familiarity to construct factorial experiments in blocks;
E. A comprehensive knowledge and familiarity to employ and asses the results of discriminant analysis, multiple logistic regression, principal component, clustering and multivariate analysis of variance to real observational data;
F. A comprehensive knowledge and familiarity to use R for the application and statistical analysis of linear regression and machine learning techniques for the modelling of experimental and observational data.

Module information

A. Observational versus experimental data
i. Using matrices to represent data, matrix notation

B. Multiple regression and the general linear model
i. Assumptions, generalised least squares estimator
ii. Formulating the model, the estimator and other results using matrix formulation
iii. Assessment of the fitted model, coefficient of determination and the sample correlation coefficient
iv. Checking the assumption of the linear model and other regression diagnostics: Residual plots, multicollinearity, homoskedasticity
v. Hypothesis testing for variable selection, model selection criteria
vi. Model selection methods: forward, backward and stepwise.
vii. Factorisation of the regression sum of squares and the ANOVA table.
viii. Missing values and imputation

C. Designed experiments for multiple comparison tests
i. Intuition to multiple comparisons.
ii. One-, two-way ANOVA and the ANCOVA.
iii. Completely randomised experiments (balanced and imbalanced) and Latin squares
iv. Understand and interpret a logistic regression model (categorical response variable)
v. The multivariate linear regression and Multivariate ANOVA (MANOVA)
vi. Informal approaches to check for multivariate Normality
vii. Log-linear models and contingency tables for checking association between variables.

D. Multivariate methods
a. Classification
i. Multiple logistic regression: odds ratio, log odds
ii. Linear discriminant analysis (LDA): classification between populations, discriminant function and probability of misclassification, multiclass LDA, Quadratic discriminant analysis
iii. Test and training samples, leave-one-out and k-fold cross-validation
iv. Precision, recall, accuracy, Youden index, ROC curves, positive predictive value, negative predictive value, confusion matrix
v. Applications from epidemiological prospective and retrospective studies
vi. The relationship between MANOVA and LDA.
b. Cluster analysis: similarity measures, single-link and other hierarchical methods, k-means
c. Dimensionality reduction - Principle components analysis: definition, interpretation of calculated components, use in regression


Learning and teaching methods

The module has 35 contact hours in total. These consist of 20 lectures, 9 computer labs and 6 workshops (classes) during the summer term.

Bibliography

  • Faraway, Julian J. (2014-07-07) Linear Models with R, Philadelphia, PA: Taylor & Francis Inc.
  • James, Gareth. (c2013) An introduction to statistical learning: with applications in R, New York, NY: Springer. vol. 103

The above list is indicative of the essential reading for the course. The library makes provision for all reading list items, with digital provision where possible, and these resources are shared between students. Further reading can be obtained from this module's reading list.

Assessment items, weightings and deadlines

Coursework / exam Description Deadline Coursework weighting
Coursework   Lab Test    20% 
Coursework   Class Test    30% 
Coursework   Final Project    50% 

Exam format definitions

  • Remote, open book: Your exam will take place remotely via an online learning platform. You may refer to any physical or electronic materials during the exam.
  • In-person, open book: Your exam will take place on campus under invigilation. You may refer to any physical materials such as paper study notes or a textbook during the exam. Electronic devices may not be used in the exam.
  • In-person, open book (restricted): The exam will take place on campus under invigilation. You may refer only to specific physical materials such as a named textbook during the exam. Permitted materials will be specified by your department. Electronic devices may not be used in the exam.
  • In-person, closed book: The exam will take place on campus under invigilation. You may not refer to any physical materials or electronic devices during the exam. There may be times when a paper dictionary, for example, may be permitted in an otherwise closed book exam. Any exceptions will be specified by your department.

Your department will provide further guidance before your exams.

Overall assessment

Coursework Exam
100% 0%

Reassessment

Coursework Exam
100% 0%
Module supervisor and teaching staff
Dr Stella Hadjiantoni, email: stella.hadjiantoni@essex.ac.uk.
Dr Stella Hadjiantoni & Dr Rafal Kulakowski
Dr Stella Hadjiantoni (stella.hadjiantoni@essex.ac.uk), Dr Rafal Kulakowski (rkulaka@essex.ac.uk)

 

Availability
Yes
Yes
Yes

External examiner

Prof Fionn Murtagh
University of Huddersfield
Professor of Data Science
Resources
Available via Moodle
Of 617 hours, 0 (0%) hours available to students:
617 hours not recorded due to service coverage or fault;
0 hours not recorded due to opt-out by lecturer(s).

 

Further information

Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements, industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules. The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.

The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.