However, when the elements of interest are classes or groups of some kind - the citizens living in given towns; car models, rather than specific vehicles - then there is variability inherent to the data.
Symbolic data goes beyond the usual data representation model, considering variables whose observed values for each element are no longer necessarily single real values or categories, but may assume the form of sets, intervals, or, more generally, distributions. In this talk we focus on the analysis of interval data, i.e., when the variables’ values are intervals of IR.
Parametric probabilistic models for interval-valued variables have been proposed and studied by Brito & Duarte Silva (2012). These models are based on the representation of each observed interval by its MidPoint and LogRange, and Multivariate Normal and Skew-Normal distributions are assumed for the whole set of 2p MidPoints and LogRanges of the original p interval-valued variables.
The intrinsic nature of the interval-valued variables leads to different structures of the variance-covariance matrix, represented by different possible configurations. For all cases, maximum likelihood estimators of the corresponding parameters have been derived.
This framework may be applied to different statistical multivariate methodologies, thereby allowing for inference approaches for symbolic data; in particular M(ANOVA), discriminant analysis, model-based clustering, robust estimation and outlier detection are addressed. The referred modelling and methods are implemented in the R package MAINT.Data, available on CRAN.