64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Multivariate Analysis following Multiple Imputation of HIV Risk Behaviours among Youth in the Kingdom of eSwatini (formerly Swaziland)


64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: classification, multivariate

Session: CPS 20 - Multivariate analysis

Monday 17 July 4 p.m. - 5:25 p.m. (Canada/Eastern)


Data sets with missing values are standard in practice, and data imputation is a way of preparing data for analysis. Multiple imputations, in which missing values are replaced with multiple plausible values, is the preferred approach for working with missing values in survey data. The advantage of multiple imputation over other imputation methods is that it accounts for the uncertainty due to missing values.
Cluster analysis is an approach for discovering groupings and patterns in a data set. Standard cluster analysis approaches require complete data; hence, data imputation before cluster analysis is important. Similarly, other exploratory techniques also require complete data sets.
Risk behaviours are those behaviours that are said to elevate the risk of HIV infection. Early sex debut, multiple sexual partners, transactional sex, low condom use, and low male circumcision are identified risk behaviours. In Eswatini, for those who reported to have had early sex debut (before the age of 15), 23.7% were HIV positive. Of those with more than one sexual partner, 28.7% were HIV positive, while 20.9% of adults who did not use condoms at last sexual intercourse in the prior 12 months were HIV positive. Identifying the pattern of these risk behaviours will assist eSwatini in the fight against HIV, which will translate into achieving the 2030 United Nations Agenda on zero infections,
Three stages are proposed in multiply imputed data: imputation, analysis, and combining results. In the combining stage, Rubin proposed the methodology for calculating mean and variances, but cluster analysis is concerned with hidden patterns in the data. No set of parameters defines cluster analysis. This study proposed a technique to enable the researcher to decide on the number of clusters, interpret clusters, and make overall observations about data patterns in that data set, following multiple imputation. The study also discussed how specific clusters could be identified as outliers and optionally excluded in the subsequent analysis stage.