Work

Cluster Analysis for Correlated Multivariate Normal and Binary Data

Public Deposited

Cluster Analysis deals with classifying a sample of multivariate measurements into different categories. In this dissertation we study the effect of the correlation structure of the data on the performance of a clustering method. We begin with the analysis of two-component normal mixture models and then proceed to cluster analysis of binary mixture models. Clustering for binary data is the main focus of this dissertation. The normal mixture model part gives a comparative study of the K-means algorithm and the mixture model (MM) method. Analytic comparisons of the two methods are conducted for the univariate case under both homoscedasticity and heteroscedasticity assumptions and for the bivariate case under the homoscedasticity assumption. Simulation results are given to compare the two methods for both univariate cases and bivariate cases under a range of sample sizes. The latent class analysis (LCA) is a classical approach to clustering in case of binary data. The LCA is based on the local independence assumption. We extend the LCA model to allow for correlations between binary variables conditional on the cluster identity. Simulation results show significant gains in correct classification rates using the correlated Bernoulli model over the independent Bernoulli model when there exist strong correlations between the binary variables conditional on the cluster identity. This method is illustrated by applying it to two real data sets.

Last modified
  • 08/13/2018
Creator
DOI
Subject
Keyword
Date created
Resource type
Rights statement

Relationships

Items