64th ISI World Statistics Congress - Ottawa, Canada

Abstracts

No abstracts found. Try another search term or Show All

"The double burden of Moroccan women in rural area: domestic work and precarious family agricultural employment "

Format: CPS Abstract

Author: Mrs YATTOU AIT KHELLOU

Co-Authors:

  • BAHIJA NALI

the domestic and professional spheres of rural women are so closely intertwined that the distinction between professional and domestic work is not relevant. Especially since these two forms of work have similar characteristics: they are free, unrewarding and vectors of inequality. Indeed, rural women are stuck between domestic work and unpaid agricultural work, often in livestock activities which represents a continuation of the domestic work of women in rural areas. This paper has three main objectives: - answer the question: why this gap between women's contribution to unpaid domestic work in rural and urban areas? - propose a valuation of the unpaid work carried out by women on family farms, based on the working hours of self-employed rural women from the national employment survey and using an average of the relevant wages. - produce an exhaustive estimate of the unpaid work carried out by women in the Moroccan countryside.

A Coevolution Model of Network Formation and Content Generation on Social Reading Platform

Format: CPS Abstract

Author: Dr Mirai Igarashi

Co-Authors:

  • Nobuhiko Terui

In the context of the coevolution modeling, the formation of social networks and the behaviors of individuals are jointly modeled as they influence each other. However, too little attention has been paid to qualitative aspect of the behavior such as sentiment and topic of contents. This study proposes a Bayesian coevolution model incorporating dynamic network model and topic model to describe the content generation process. The proposed model are empirically applied to the data in Japanese story telling platform.

A Comparison of Machine Learning methods for survival prediction

Format: CPS Abstract

Author: Durjoy Dey

Co-Authors:

  • Dr. Tamanna Howlader
  • Srizan Chowdhury

Survival analysis is a subset of statistics. The most popular statistical model used for analyzing survival data is Cox PH model. Besides, several machine learning (ML) methods has been developed for modelling this data. As the volume, variety and velocity of the data increases, it is important to understand under which situation which model should be adopted. The study aims to compare Cox PH with ML methods under various scenarios to find at what scenario which methods outperforms others.

A Computational Analysis of Snowball Sampling for the Estimation of Means

Format: CPS Abstract

Author: Mr João Gabriel Malaguti

Co-Authors:

  • Alinne de Carvalho Veiga
  • Letícia de Carvalho Giannella

Snowball sampling is a non-probabilistic sampling method, widely used in the social sciences for its ability in reaching hard-to-reach populations, such as drug users, victims of domestic violence and queer people. However, due to being a non-probabilistic method, formal equations for the standard error of the mean do not exist, making analyses more complex.This study bypasses this issue by making use of computational statistics, in particular Monte Carlo simulations, which allows for approximate estimations based on different controlled scenarios in order to improve the understanding of the sampling method.The effects of connection density, i.e. the number of connections one person has inside a population, and different indication probabilities, i.e. the probability one person has to indicate another to join the sample, were investigated on the estimation of the mean and its standard error for snowball samples.

A Cross-city Analysis of Pro-poor Growth in Sumatra Island

Format: CPS Abstract

Author: Jesica Ringo

Co-Authors:

  • Najia Helmiah

Economic growth in Sumatra Island plunged at an annual rate of 2.21 percent in 2020, but recovered sharply at the rate of 4.56 percent in 2021 due to the outbreak of COVID-19 pandemic. While economic growth is now comfortably on a par with pre-pandemic levels, the consequence of the pandemic was a rapid increase in poverty that was shown by the rise of the head count index by 4 percent in 2020. Along with the rise of economic growth in 2021, the proportion of poor people in Sumatra Island was also reduced in general, but not in all of the cities. It means that some cities have more pro-poor growth than other cities. By using cross-city analysis, this study aims to determine under which conditions growth can be considered as pro-poor in Sumatra Island. This degree of pro-poor growth is measured by poverty equivalent growth rate (PEGR) that was calculated using raw data of Total Household Consumption from National Socio-Economic Survey. We use panel data of 154 cities in Sumatra Island over the period 2019-2021. This study can contribute to evaluating the pro-poorness of government policies in several factors. To identify the effect of labor market, local government budget, inequality of expenditure, and agricultural sector on pro-poor growth, we apply panel logistic regression. The cross-city evidence suggests that there is a variation in poverty reduction for the same growth rate in Sumatra Island. Agricultural sector is the most significant tool that affect the pro-poor growth due to the highest contribution of Sumatra Island to Indonesia in this sector. Surprisingly, the local government budget does not affect the pro-poor growth. Furthermore, there needs to be a deeper evaluation on the use of the government budget, so that the poor can feel the benefits of it.

A FMKL-GLD Quantile Method for Estimating Economic Growth in Nigeria in the Presence of Multicollinearity and Outlier

Format: CPS Abstract

Author: Dr Retius Chifurira

Co-Authors:

  • Ebiwonjumi, Ayooluwade
  • Knowledge Chinhamu

Nigeria’s economic growth posed to be a serious concern to policymakers, economics and scholars due to the challenges of economic characteristics, consequences and contradictions. Despite the effort made by the Central Bank of Nigeria in implementing several policies such as tightening of monetary policy rate and heavy borrowing for infrastructural development to stimulate economic growth in the past few years. The economic growth in Nigeria between 1999-2007 was 6.95 percent, between 2008-2010, it stood 7.98 percent and it was 4.80 percent between 2011-2015 and from 2016 till the day the economic growth stood at 0.81 percent. In this study, we estimate parameters to determine Nigerian economic growth in the presence of multicollinearity and outliers. We employed the FMKL-GLD quantile model to quarterly data from 1986 to 2021 from the Central Bank of Nigeria.

A Fuzzy Confidence Interval to test equality of means. Application based on the survey of Health, Ageing and Retirement in Europe (SHARE)

Format: CPS Abstract

Author: PROF. DR. Laurent Donzé

Co-Authors:

  • Dr. Rédina Berkachy

In the context of fuzzy data analysis, we recently developed the methodology to construct fuzzy confidence intervals by the so-called technique of the likelihood ratio. In particular, the distribution of the likelihood ratio is estimated by a proper bootstrap algorithm. Such intervals are suitable tools to test parameters, for example, the equality of means. We briefly describe the method and recall the relative weight of the randomness vs fuzziness appearing in the process. We intend to show the practicability of our approach with an empirical application with SHARE Data.

A Hybrid Forecasting Model for Spanish Unemployment: How COVID-19 Destroys Statistics

Format: CPS Abstract

Author: Dr Margarita Rohr

Co-Authors:

  • Vera Egorova

There are several approaches to time series forecasting. The most popular are linear models such as ARIMA or Exponential smoothing, etc. These models are simple and easy to implement, however, the main drawback is the linearity of the prediction, which does not agree with the reality. This restriction is overcame by nonlinear models, or ANN. Nonlinear models perform better for long term forecasting, while linear models are recommended for short-term predictions.

A Life Table for Uganda using Mortality Survey Data

Format: CPS Abstract

Author: Dr Leonard K. Atuhaire

Co-Authors:

  • Elizabeth Nansubuga

We shall be sharing our experience of generating a pilot life table for Uganda using age-specific death rates from a mortality survey.

A Little Bird Told Me: Official Statistics Jointly Using Social Networks and Surveys

Format: CPS Abstract

Author: Dr Victor Alfredo Bustos y de la Tijera

Co-Authors:

  • Silvia Fraustro
  • Noemí López
  • Ricardo Olvera

Data science in production of official statistics.

A Meta-Model for Predicting the Quality of Knowledge Elicitation Sessions

Format: CPS Abstract

Author: Mr Hussein Jouni

Co-Authors:

  • Lionel Jouffe
  • Philippe Bastien
  • Alban Ott
  • Léopold Carron
  • Mathieu Le Tertre
  • Telma Da Silva

Capitalizing on expert knowledge can be useful for a company. It can be for transmitting all the know-how on a given field, incorporating technical aspects for decision making, or building causal models for doing predictions. This knowledge can be represented through a Bayesian Network [1] to introduce uncertainty on the phenomenon, and, combined with Data, its performance can be improved. Elicitation is done thanks to sessions where experts works together to build models with a facilitator and a modeler. It is asked the experts to be available for a given amount of time, which can be large (several days) and with a risk that at the end of the sessions, they will not be able to have a satisfying tool. In the context of multi-project management, we propose a tool to assess the probability of success of Elicitation sessions on a given problem. This tool is obtained thanks to the Elicitation of a Bayesian Network [1] (meta-model), quantified with prior distributions.[1] Probabilistic Graphical Models: Principles and Techniques. D. Koller, and N. Friedman. Adaptive computation and machine learning MIT Press, (2009 )

A New Family of Parametric Measures of Inequality for Income Distributions

Format: CPS Abstract

Author: PROF. DR. Victor M. Guerrero Guzman

Co-Authors:

  • Pablo Martinez

Comparisons of theoretical inequality measures of income distributions, as well as empirical applications of those measures to actual data.

A Novel Dispersion Control Chart for Monitoring the Variable Dimension Processes

Format: CPS Abstract

Author: PROF. DR. Su-Fen Yang

Co-Authors:

  • Yen-Lin Liu

Statistical process control:We first propose a novel dispersion control chart to monitor the process with variable quality variables following a multivariate normal distribution.

A Quality Assessment framework for Statistics based on data-science: Institutional Management Plan for Experimental Statistics of ‘Statistics Korea’

Format: CPS Abstract

Author: Ms BITNA KANG

Due to rapid data environment change such as big data and artificial intelligence, it is more important to produce statistics in a way that is different from the survey-oriented statistics. In order to secure the reliability and accuracy of statistics produced in a new way, it is necessary to manage them as official statistics at the national level. In this paper, the quality management framework was designed based on the quality evaluation dimension of Statistics Korea.

A Review of Domestic Violence Data in Free, Harmonized International Survey Data from IPUMS

Format: CPS Abstract

Author: Ms Devon Kristiansen

Co-Authors:

  • Miriam L. King
  • Anna Bolgrien
  • Maya Luetke
  • Matthew Gunther
  • Mehr Munir

IPUMS Global Health is the world’s largest repository of free, harmonized global health survey data. IPUMS Global Health includes harmonized versions of the Demographic and Health Surveys (DHS), the Multiple Indicator Cluster Surveys (MICS), and Performance Monitoring for Action (PMA) surveys, each of which measures domestic violence (DV) for women of childbearing age. This paper documents the available information about domestic violence, demonstrates the promise and pitfalls of combining data across surveys, and maps the incidence of recent intimate partner violence (IPV) across countries. The most widely studied form of domestic violence experienced by adult women is from a current or recent spouse or sexual partner. Attitudes toward domestic violence are gauged by DHS and MICS. Information about ever and/or recently experiencing abuse from an intimate partner is collected by all three surveys. Types of intimate partner violence covered include emotional abuse, physical abuse, and sexual abuse. Survivors of IPV were also asked about whether and where they sought help. While the above broad description fits all three global health surveys, analysts need to be aware of subtle differences in question wording. PMA includes a broad question about physical abuse (“slapped, hit, or hurt in the past 12 months”) and a narrower question about threats with a weapon and attempted strangulation; DHS includes multiple questions about specific types of physical harm and about IVP both “ever” and “in the past 12 months.” The authors will provide a visual summary of the IPV-related variables in the three global health datasets and highlight the closest equivalent variables across data collections. This summary material can guide analysts in finding similar variables and avoiding inadvertent errors. Violence is also perpetrated by other family members and social connections. IPUMS DHS, PMA, and MICS include questions about non-IPV violence toward women of childbearing age, but these variables differ in their details. PMA asks about violence from another household member (not a spouse); MICS asks about being beaten as a daughter-in-law; and DHS asks open-ended questions to identify perpetrators ranging from specific relatives and in-laws to teachers and police. Domestic violence experienced during pregnancy is probed in both PMA and DHS, but the former asks women who recently gave birth and the latter asks about violence during pregnancy any time in the past. The authors will provide a visual summary of non-IPV domestic violence across the three data collections and highlight the closest equivalent variables across sources. A striking finding is how drastically attitudes toward domestic violence and the reported incidence of IPV varies across countries and over time. In general, the acceptability of “wife-beating” is declining. Thematic maps showing the prevalence of reported recent IPV by country and decade will highlight an extraordinary range of attitudes and behavior. By cautiously combining data on recent IPV from IPUMS DHS, MICS, and PMA, we will demonstrate the variation in this threat to women’s health across multiple countries, regions, and decades.

A Shrinkage Likelihood Ratio Test for High-dimensional Subgroup Analysis with a Logistic-Normal Mixture Model

Format: CPS Abstract

Author: Mr Shota Takeishi

In clinical trials, it is not always the case that the treatment effect is homogeneous for the overall population. In particular, there might be a subgroup with certain personal attributes who benefits from the treatment more than the rest of the population. Furthermore, such attributes can be of high dimension if, for example, biomarkers or genome data are collected for each patient. With this practical application in mind, this study concerns testing the existence of a subgroup with an enhanced treatment effect under the setting where the subgroup membership is potentially characterized by high-dimensional covariates. The existing literature on testing the existence of the subgroup has the following two drawbacks. First, because parameter characterizing the membership of subgroup is unidentified under the null hypothesis of no subgroup, the asymptotic null distributions of test statistics proposed in the literature often have the intractable forms. Notably, they are not easy to simulate, and hence, the data analyst have to resort to computationally demanding method, such as bootstrap, to calculate the critical value. Second, most of the methods in the literature assume that the dimension of personal attributes characterizing the membership of subgroup is of low dimension. Because the complicated asymptotic null distributions of the test statistics usually depends on the dimension of the personal attributes, extension of their methods to high-dimensional case is nontrivial. To fix this problem, this research proposes a novel likelihood ratio-based test with a logistic-normal mixture model for testing the existence of the subgroup. The proposed test is built on a modified likelihood function that shrinks possibly high-dimensional unidentified parameter towards zero when there exists no subgroup. This shrinkage simplifies the asymptotic null distribution. Namely, we show that, under the null hypothesis, the test statistics weakly converges to half chi-square distribution, which is easy to simulate. Furthermore, this convergence result holds even under high-dimensional regime where the dimension of the personal attributes characterizing the classification of the subgroup exceeds the sample size.

A Study on the Effect of Interviewer's Survey Career Length on the Non-Response Rate

Format: CPS Abstract

Author: sung-ha kim

As an interviewer, I have also conducted surveys in the field to collect national statistics data. Having doing so, I learned firsthand that there are many people who simply do not respond to surveys. Based on my survey experience, I became interested in the factors affecting the non-response rate. In this study, I analyzed the interviewer's career length as one of the factors affecting the non-response rate and considered how to use it in the actual survey work.

A TERRITORIAL APPROACH TO THE MULTIDIMENSIONAL MEASURE OF FOOD SECURITY

Format: CPS Abstract

Author: Dr Zouhair Lahrizi

Long-term food security improves health and alters people's lives. However, it remains the result of the interaction between agriculture, food policies and socio-economic factors. Often analysed from the standpoint of food products availability, food security has taken on additional dimensions throughout time, taking into consideration new elements of poverty by combining other indicators linked to the ideas of access, quality, and stability. This multidimensional nature of food security is presently universally recognized. As a result, there has been a surge in interest in methods used to examine food security across its many aspects. The availability of food security indicators has been one of the prim concerns of the Moroccan national statistics system, especially with the implementation of the SDGs. In fact, the evaluation of food security is mainly dependent on a number of indicators that are connected to the SDG indicators and encompass the four pillars of food security, namely availability, access, nutrition, and stability. The territorialisation of the SDG reporting on the Casablanca-Settat region provides an opportunity to collect and measure the multifaceted aspects of regional food security. This is especially true since, according to the UNCDF, the territorial strategy might be the key to guaranteeing food security and boosting community and household resilience to food risks . In this perspective, our study presents a multidimensional approach of measuring food security at the territorial level, concentrating on the Casablanca-Settat region, based on composite index techniques. To achieve this objective, the study adopts a factorial approach based on Principal Component Analysis (PCA). It involves (I) identifying indicators of each component of food insecurity, (ii) normalizing each component’s indicator, (iii) aggregating the data per component, and (iv) computing the composite index for all components. Key words: Food security, Territorial approach, Factor analysis, Principal Component Analysis (PCA) , Casablanca-Settat region.

A TIME SERIES MODEL FOR THE EFFECT OF FUEL PRICES ON FOOD AND NON-ALCOHOLIC BEVERAGE PRICES CPI IN NAMIBIA 2017-2021

Format: CPS Abstract

Author: Prof. Lillian Pazvakawambwa

Co-Authors:

  • Elzahn Isaak

Fuel is an energy source that plays an important role in the economy of most countries and its fluctuating prices affect most sectors in the economy. This study investigated the effect of diesel prices on food and non-alcoholic beverages CPI in Namibia. Statistical descriptive summaries were computed, time series regression model was used to investigate the relationship between food and non-alcoholic beverages CPI and average diesel prices, time series plots and time series decompositions to establish the components, Autoregressive Distributed Lag (ARDL) model was used to test for cointegration and Auto Regressive Integrated Moving Average (ARIMA) model was used to forecast future values upon monthly time series data for the period 2017 to 2021. The study found that an increase in diesel prices is associated with an increase in food and non-alcoholic beverages CPI.The results also suggested that there is a long-run relationship between diesel prices and food and non-alcoholic beverages CPI. Furthermore, there is an upward trend in both diesel prices and food and non-alcoholic beverages CPI with repeating patterns in the seasonal component. Additionally, the study predicted a forecast showing an increase in both diesel prices and food and non-alcoholic beverages CPI. There is need to introduce new policies to protect individual consumers and relief provision when fuel prices increase exponentially.  Keywords: diesel prices, CPI, Namibia, time series forecasting , ARIMA

A Total Error Framework for Digital Data

Format: CPS Abstract

Author: Lilli Japec

Co-Authors:

  • Ingegerd Jansson

A changing survey landscape (Lyberg and Heeringa 2021) with increasing nonresponse rates and survey costs has caused organizations to explore new data sources for statistics production (Japec and Lyberg 2021). There is a great potential to use new types of data, hereafter called digital data, for statistics production especially when blending them with existing survey or administrative data (National Academies of Sciences, Engineering, and Medicine 2022, Japec et al 2015). Our quality framework build on existing frameworks for surveys (Groves and Lyberg 2010), administrative (Zhang 2012, Reid et al 2017), found (Biemer and Amaya 2021) and digital trace data (Sen et al 2021).

A bayesian spatio-temporal approach for the improved use of malaria routine data to assess malaria interventions in Cameroon

Format: CPS Abstract

Author: Mrs Arnold FOTTSOH FOKAM

Co-Authors:

  • Arnold FOTTSOH FOKAM

Malaria is still ravaging millions of Cameroonians. Despite significant efforts have been made to end it, challenges remain in assessing malaria policy and interventions, more specific due to routine data quality issues. Use routine collected data, remains a challenge in Cameroon to assess their malaria intervention. We are proposing suitable statistical methods that handle spatial structure and uncertainty on the relative risk that is relevant to National Malaria Control Program. This study offers a quick and efficient way to predict clinical malaria incidence and to monitor malaria burden within the country and it reinforce the importance to strengthen surveillance, monitoring and evaluation system.

A closer look at extreme poverty in Indonesia from Education and Social Assistance

Format: CPS Abstract

Author: Mr Ariful Romadhon

Co-Authors:

  • Suryo Adi Rakhmawan

Extreme poverty is one of the worldwide challenges addressed in the Sustainable Development Goals, which are targeted for completion by 2030. In Indonesia, extreme poverty has risen over the last three years, rising from 3.7 per cent in 2019 to 4.0 per cent in 2021. The government has also begun to focus on reducing severe poverty by 2024 through various programs such as social assistance and human resource capacity building through education and training. This study aims to examine the impact of education and social assistance, as well as other socioeconomic demographic controls. This study reveals that education characteristics substantially impact alleviating severe poverty in Indonesia. In contrast, social assistance has a favourable impact on extreme poverty. This study suspects a selectivity bias because the need to participate in social assistance programs is for poor people. Finally, this study concludes that variables affecting extreme poverty are females living in rural areas, in the elderly age group, having low education, having a severe category of disability, living in a household with five or more household members, living in a household headed by women, have the main lighting for non-electric type houses, live in households with inadequate sanitation, live in households with inadequate sanitation, live in households that are unfit for habitation, do not have social security or health insurance.

A comparative study of simulating stock prices using geometric Brownian motion model under normal and convoluted distributional assumptions

Format: CPS Abstract

Author: Dr Daniel Maposa

Co-Authors:

  • Alexander Boateng
  • Eric Teye Mensah
  • Nana Kena Frempong

This study proposes a modified Geometric Brownian motion (GBM), to simulate stock price paths under normal and convoluted distributional assumptions. This study utilised four selected continuous probability distributions for the convolution because of shared properties, including normality, and parameters that have a standard distribution with a location and scale parameters of zero and one, in that order. The findings from this study revealed that the simulation of price paths looks identical under the assumption of normal distribution and normal convolved with normal, Laplace, and Rice distributions for different sample sizes and parameter settings but differs with respect to the Cauchy distribution. Furthermore, the study found that all the MAPE values for the normal and convoluted distributions underlying the GBM were approximately less than 10%, indicating high forecast accuracy. However, the average simulated price paths for the GBM under the normal distribution was found to be significantly different from the GBM under convoluted distribution when a t-test was employed for different sample sizes and different settings of the drift and volatility values.

A fuzzy geospatial framework for price shocks propagation: the case of gasoline in Italy

Format: CPS Abstract

Author: Mr Luigi Palumbo

The ideal session for this paper deals with one or more of the following topics: geospatial statistical analysis, spatio-temporal statistical analysis, fuzzy set theory, consumer prices statistics, price shock transmission over time and/or space, distribution network analysis, spatial clustering.

A general framework for reporting methods in regression analysis

Format: CPS Abstract

Author: Ms Hannah Kümpel

Co-Authors:

  • Hannah Kümpel
  • Sabine Hoffmann

Recently, visualization techniques and effect size measures aimed at maximizing the interpretability and comparability of statistical results are becoming increasingly important, not only in the context of interpreting results from machine learning algorithms but also in empirical research in general as the scientific community shifts its focus in reporting away from p-values. Many such methods have been proposed across disciplines, from individual conditional expectation plots for improved interpretation of black box model results to predicted changes in probability as a more instructive basis for medical decision-making compared to odds ratios. However, most of these quantities are practically motivated and narrowly applicable to a specific setting or field, leading to some inconsistencies and unclarities in the overall methodology. We have developed a formal framework for the consistent derivation of effect size measure definitions and visualization techniques aimed at maximizing the interpretability and comparability of regression results. Specifically, our framework gives a common mathematical setting for such methods and definitions of generalized quantities which may be specified to correspond to, amongst other things, partial dependence plots, marginal effects, adjusted predictions, and predictive comparisons. We achieve this generalization by utilizing probability measures to derive weighted averages over areas of interest for the regressor values. This approach also allows for various different assumptions regarding the dependence structure of the regressors, and, most importantly, for the specification of the generalized quantities to be tailored to each specific research question or meta-analysis. The framework provides a consistent method for deriving point estimates and uncertainty regions for every quantity and may be applied to the results of both frequentist and Bayesian inference. Notably, our generalized version of marginal effects may be specified to separately quantify main and interaction effects. Furthermore, the framework includes a method for comparing the expected distributions of the target variable before and after a change of regressor values, combining estimation and sampling uncertainty in the interest of communicability. In this paper, we first give a brief summary of this theoretical framework and subsequently provide various examples of its practical relevance using real research questions from the fields of interpretable machine learning, multi-analyst studies, and medical decision-making.

A generalized functional linear model with spatial dependence

Format: CPS Abstract

Author: Sooran Kim

We develop a regression model for spatially dependent binary response variables when the covariates take the form of functional processes over time at each location for which the response is observed. We model the functional covariates in terms of a Fourier basis truncated to a finite number of terms. Responses are taken to be a Markov random field with conditional binary distributions and isotropic spatial dependence. Estimation is approached through the use of a composite likelihood constructed from full conditional response distributions, sometimes also called Besag’s original pseudolikelihood in the spatial literature. Asymptotic properties are given for maximum composite likelihood estimators using a repeating lattice context, and use of the model is illustrated with data relating new COVID vaccination rates in June for counties to the number of weekly infections reported over the previous several months in those same counties.

A literature review of Securities Holdings Statistics research and practitioner guide

Format: CPS Abstract

Author: Martijn Boermans

Securities Holdings Statistics (SHS), compiled by the European System of Central Banks (ESCB) have spurred research over the past decade. SHS provide high-quality security-by-security data on portfolios. SHS benefits from very high coverage across euro area investors, relying on harmonized reporting and data preparation by the ECB since 2013-Q4. This paper provides a literature review of SHS research by surveying all published journal articles and working papers using granular SHS data. We demonstrate a rising popularity of SHS with 62 studies so far, advancing three research themes: (i) the banking and finance literature, mostly on interconnectedness and contagion, (ii) the international investment literature, and, (iii) monetary policy research on quantitative easing. Still, SHS research is in its infancy. We highlight an upcoming wave of studies most prominently in sustainable finance. We discuss avenues of future research. Finally, we provide a practitioner’s guide with code, cleaning procedures and common specifications. Key words: Securities holdings statistics, portfolio investment, literature review, Eurosystem data, home bias. JEL-classifications: E52, E58, F14, F3, G11, G2, G51, Q56.

A long-term frailty quantile regression model: application with a maternal population with severe COVID-19

Format: CPS Abstract

Author: Dr Agatha Sacrament Rodrigues

Co-Authors:

  • Patrick Borges

In this work, we address the problem of assessing prognostic factors on the specific survival times of pregnant and postpartum women hospitalized with severe acute respiratory syndrome confirmed by COVID-19 when cure is a possibility, where there is also the interest in explaining the factors' impact on different quantiles of the survival times and in estimating the unobservable heterogeneity given by considering the prognostic factors that are not observed (as smoke status).  Besides, the hazard function presents a unimodal form. To this end, we propose a quantile regression model for survival data in the presence of long-term survivors based on the defective Dagum distribution model with a power variance function (PVF) frailty term introduced in the hazard function to control for unobservable heterogeneity in patient populations, which is conveniently reparametrized in terms of the q-th quantile and then linked to covariates via a logarithm link function. We consider Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in the proposed model and we evaluate its performance through a Monte Carlo simulation study. This study is part of the Brazilian Obstetric Observatory, a multidisciplinary project that aims to monitor and analyze public data from Brazil in order to disseminate relevant information in the area of maternal and child health.

A new nonparametric control chart for monitoring general linear profiles based on log-linear modelling

Format: CPS Abstract

Author: Prof. Longcheen Huwang

Co-Authors:

  • Yen-Ming Liao

Profile monitoring is a technique for monitoring the stability of a functional relationship between a response variable and one or more explanatory variables over time. General linear profiles monitoring is the most important one because the relationship between the response variable and the explanatory variables is easy to characterize by linearity besides its flexibility and simplicity. In addition, most of general linear profiles monitoring techniques assume normality of error random variables. However, the normality of error random variables is not satisfied in certain applications. This causes the existing monitoring methods for general linear profiles both inadequate and inefficient. Based on the log-linear modelling, in this paper we develop a nonparametric charting scheme for Phase II monitoring of general linear profiles where normality of error random variables is not assumed. The proposed charting method applies the cumulative sum (CUSUM) to the Pearson chi-square test of the vector of the Wilcoxon-type rank-based estimators of regression coefficients and an error variance estimator. Performance properties of the developed control chart are evaluated and compared with existing charting methods in terms of average run length (ARL). A real example is also used to illustrate the applicability and implementation of the proposed monitoring scheme.

A non-homogeneous Poisson model and a reversible jump MCMC algorithm to estimate the probability of occurrences of air pollution exceedances

Format: CPS Abstract

Author: Dr Eliane R. Rodrigues

Co-Authors:

  • Mario H. Tarumoto
  • Juan Antonio Cruz-Juarez
  • Hortensia J. Reyes-Cervantes
  • Guadalupe Tzintzun

High levels of air pollution may have serious harmful effects in human health. Hence, in order to reduce population exposure and, consequently, the health hazard associated to it, preventive measures have been implemented in several cities around the world. In Mexico City environmental emergency alerts are declared whenever high levels of ozone and/or PM10 occur. They are triggered when these pollutants concentrations exceed their corresponding assigned thresholds. When emergency alerts are declared, several measures are taken in order to prevent population exposure and reduce pollution levels. In the present talk, we study the problem of estimating the probability of occurrences of pollutants concentrations exceedances using a non-homogeneous Poisson model in the presence of change-points. The exceedances considered here are based on thresholds set by the Mexican air quality standard. In order to estimate the number of possible change-points, their locations as well as the parameters of the non-homogeneous Poisson rate function, we use the Bayesian point of view and a reversible jump Markov chain Monte Carlo algorithm. We apply the model and the algorithm to the daily maximum ozone and PM10 measurements provided by the Mexico City monitoring network. This is a joint work with Mario H. Tarumoto, Juan A. Cruz-Juarez, Hortensia J. Reyes-Cervantes and Guadalupe Tzintzun.

A proposed Strategy for a Reliable Evidence of a society in Transition.

Format: CPS Abstract

Author: Dr Dalia Galaleldin Elabady

The ultimate goal of this proposed strategy is to produce specific effectiveness and efficiency that can contribute to convey a right message in a right way to the right persons and to achieve the higher interests of the state too. Undoubtedly the Government directs all tools and efforts to reach desired goals and expected future possibilities to achieve prosperity, and to act on implementing goals of “Egyptian vision 2030” and how these investments and national projects will have a direct impact on human capital development to change future generations characteristics, to be able to anticipate future problems and to explore more alternatives, in a world of complexity and uncertainty about future as well as in a context where any achievement is often misunderstood and misinterpreted, within the complex challenges that face our entire society. Human security pillars represent the infrastructure on which the ratifications of human, social and national security are based, including trails achievement of well-being aspects that government done for an appropriate condition of green environment in its comprehensive concept and taking into consideration the impact on economic and social dimensions.

A robust simulation study to compare measures for meaningful batting averages in cricket

Format: CPS Abstract

Author: Dr Paul van Staden

Co-Authors:

  • Johannes Vorster
  • Inger Fabris-Rotelli

Cricket is a bat and ball sport, where the aim is to score as many runs while defending your wickets. If the wickets are struck by the ball the batter is declared out and a new batter has to defend the wickets. The game is won when the opposing team cannot exceed the required number of runs. The term batter has superseded the word batsman to make the terminology gender neutral and more inclusive. The most common metric used by both players and spectators to assess a batter’s performance is the batting average. The traditional batting average is calculated as the total number of runs scored divided by the number of completed innings, where the runs from uncompleted innings are added in the total number of runs. The traditional batting average makes use of right-censored data through not-out scores. Therefore, the traditional batting average tries to account for this deflation in the average by only dividing by the number of completed innings, but this in turn inflates the average. Thus, finding the ideal method of altering the batting average calculation in a fair and just manner requires the examination of various different methods proposed in the literature. Although there are many different methods suggested in the literature to adjust the batting average, there has not been any published research to determine the optimal method to use. In this paper we will apply the smoothed bootstrap technique to all the adjusted measures and compare the variability of their distributions to determine the best method to replace the traditional batting average.

A spatio-temporal statistical downscaling model for combining spatially misaligned maximum temperature data using R-INLA

Format: CPS Abstract

Author: Miss Sylvia Shawky

Co-Authors:

  • Sylvia Shawky
  • Abdelnasser Saad
  • Amira Elayouty

Our aim is to present a statistical downscaling framework for combining the monthly maximum temperature observations from in-situ observations across the Nile Basin countries from 2011 to 2020 with the gridded data simulated from a regional climate model  across the same study region and period. To accurately account for the uncertainty from the different data sources and propagate it to predictions, a spatio-temporal coregionalization model that assumes a joint distribution between the covariate (simulated observations) and the response (in-situ observations) is employed.

A useful parametric specification to model epidemiological data: revival of the Richards’ curve

Format: CPS Abstract

Author: Dr Marco Mingione

Co-Authors:

  • Pierfrancesco Alaimo Di Loro
  • Antonello Maruotti

The main idea of our work is to provide a comprehensive review of the Richards' curve and illustrate its wide applicability in the modeling of several epidemic trends, while also providing new insights about the model's formulation, estimation, and interpretability. Such a task is paramount as much attention is now devoted to a prompt and thorough understanding of epidemics' spread dynamics, due to global warming and bad hygiene practices in human-animal and animal-animal interactions, which will likely increase in the next years. Here, two different estimation methods are described. The first, based on likelihood maximization, is particularly useful when the outbreak is still ongoing and the main goal is to obtain sufficiently accurate estimates in negligible computational run-time. The second is fully Bayesian, and allows for more ambitious modeling attempts such as the inclusion of spatial and temporal dependence, but it requires more data and computational resources. Regardless of the estimation approach, Richards' specification properly characterizes the main features of any growth process (e.g. growth rate, peak phase, etc.), leading to a good fit, while also providing trustworthy short- to medium-term forecasts. To demonstrate such flexibility, we show different applications using publicly available data on recent epidemics where the data collection processes and transmission patterns are extremely heterogeneous, as well as benchmark datasets widely used in the literature as illustrative.

A wavelet regression approach for dependence calibration in conditional copula model

Format: CPS Abstract

Author: Dr Cheikh Tidiane Seck

Co-Authors:

  • Aba Diop

In presence of covariates, dependence between random variables can be modelized using conditional copula. Whenever the copula function belongs to a given parametric family, an important question is to modelize the relationship between the copula dependence parameter and some covariate, which is described by the so-called calibration function. In this paper, we propose a wavelet regression approach to estimate this calibration function. We discuss asymptotic minimax properties of the linear and non-linear wavelets estimators and show their performance via a simulation study. An application to meteorological data reveals that the temperature linearily influences the dependence between the maximum and the minimum humidity variables.

ANALYSIS OF THE INFLUENCE OF PHYSICAL INFRASTRUCTURE, SOCIAL, ECONOMIC AND SPATIAL ON INCLUSIVE GROWTH IN INDONESIA

Format: CPS Abstract

Author: Ms Valent Gigih Saputri

Inclusive growth is economic growth that is able to reduce poverty, income inequality and unemployment. Inclusive growth in Indonesia was inconsistent, unequal between regions and varied. In addition, there are indications of spatial linkages due to socio-economic interactions between regions. The purpose of this study is to determine the effect of physical infrastructure, social, economic and spatial linkages on inclusive growth in Indonesia. With various measures of inclusive growth, this study uses a composite index of inclusive growth in 34 provinces in Indonesia during 2015-2020. The analytical method used is spatial Durbin model for panel data. The results showed that physical infrastructure (construction cost index, road length ratio, and the number of air transport passengers), social (population, school enrollment rates and internet control) and economic (regional original income ratio) had a significant effect on the inclusive growth that occurred. By calculating aspects of spatial linkages, the contribution of physical infrastructure (number of air passengers), social (gender empowerment, school participation rates and internet control), economy (inflation) and the achievement of inclusive growth in the surrounding area also have a significant effect on inclusive growth in a region. In general, there needs to be a synergy between the central government and local governments as well as local governments and other local governments. Spatial interactions that are proven to be able to achieve more inclusive growth need to be considered in formulating and implementing policies, especially from the dimensions of physical, social and economic infrastructure.

APPLICATION OF MULTIDIMENSIONAL SCALING ANALYSIS (MDS) ON REGENCY MAPPING IN CENTRAL SULAWESI PROVINCE BASED ON INFECTIOUS DISEASE INDICATORS

Format: CPS Abstract

Author: Mr ahmad risal

Co-Authors:

  • Wawan Saputra
  • Ahmad Risal

Extraordinary events of infectious diseases and poisoning are still a severe problem for people in Central Sulawesi. Some infectious diseases that still often cause extraordinary events are diarrhea, dengue fever, measles, and food poisoning. Health officials have made several efforts, but extraordinary events continue to occur in Central Sulawesi. Even in 2020, 8 outbreaks were reported, with 568 cases and two deaths. Based on these data, more efforts from the government to tackle the problem of infectious diseases are still very much needed.

Abstract- Admin-Based Census Development of Geographic, Buildings, and Dwellings Master files using GIS for Abu Dhabi Emirate

Format: CPS Abstract

Author: Mrs Alya Aldhaheri

The main outcome of the project is to create three master files, a geography master file, a building master file, and a unit master file. The geography master file aims to identify the administrative boundary of the Abu Dhabi Emirate, the Emirate consists of three regions: Abu Dhabi, Al Ain, and Al Dhafra, not forgetting to mention the districts and communities that belong to each region. While building master file aims to identify all built structures in the emirate of Abu Dhabi and the units master file aims to identify all the units associated with these buildings. The data were collected from four different sources. Which are Building Data Management System (BDMS), Base Map Footprint, Household Frame Update (HHFU), and Abu Dhabi and Al Ain Distribution Companies (ADDC/AADC). However, reaching a complete and comprehensive master files that include all requested records and cover all attributes has been a huge challenge, because these data had some limitations and shortage in coverage, therefore these four sources have been complementary to each other to cover all buildings and units of Abu Dhabi emirate. Then a priority list has been sited up for each master file, and it has been set due to the completeness of records and attributes of the targeted data. Additionally, a quality check has been done on all master files to improve the methodology.

Abstract- Construction of Admin Sources-Based Household Frame Using GIS for Abu Dhabi Emirate

Format: CPS Abstract

Author: Mrs Alya Aldhaheri

The paper aimed to construct an admin sources-based household frame for the Abu Dhabi Emirate. Administrative data gathered for purposes other than producing official statistics, provide new ways of collecting data on individuals, households, and dwellings. Many government agencies place a high value on information regarding houses and families. Currently, the Statistics center in Abu Dhabi only produces household and family information by conducting household surveys through field applications. Census data on families and households contains counts of households, data on the distribution of household sizes, and details on the relationships between people within households. Government decisions such as income support and social housing are developed and evaluated using this data. In this study, two potential data sources for creating a household frame were used. Tawtheeq data from the Department of Transport and Municipalities and Abud Dhabi and Al Ain Distribution Companies bill data (ADDC/AADC). A filter has been applied for all records from ADDC/AADC bill data and Tawtheeq data based on specific criteria in order to identify the targeted households. For example, a filter has been applied to identify residential accounts, and only the accounts that have been active in January 2022 were included. Then all non-spatial data were converted to spatial data for both sources. Moreover, FME models were generated to process all requirements to create household frames. However, to check the quality of the data a correlation has been calculated between the admin-based data and survey data has been done. As a result, it has been found that there are high correlations in some areas, moderate correlations in others, and unfortunately, some areas suffer from a low correlation.

Achieving Sustainable Development Goals in the Continuous Presence of the Rampaging Covid-19 Pandemic in Nigeria

Format: CPS Abstract

Author: Dr Nureni Olawale Adeboye

Co-Authors:

  • Oyenuga, Iyabode Favour
  • Aliyu Usman

The sustainable development goals (SDGs) agenda as an offshoot of the Millennium development goals (MDGs) scheme becomes imperative in the ever-increasing challenges of the dynamic world. While the MDGs focused mainly on the amelioration of poverty and hunger in the less developed countries of the world, SDGs goals are of wider scope in their terms, which extend to social and economic aspects of human society and to a spatial extent, their interactions with the ecosystem by putting sustainability as a focus. SDGs are targeted towards achieving the world transformation agenda on or before the year 2030. The realization of this, however, has been seriously checkmated due to the global outbreak of Covid-19 and its declaration as a pandemic that the entire world has to live with for a long time. In this session, which beams a searchlight on the developing nation of Nigeria, all the listed seventeen (17) goals were critically evaluated based on their framework using reality concepts within the context of the ravaging pandemic in the country. On February 27th 2020, Nigeria became the first country in Sub-Saharan Africa to announce the discovery of COVID-19 cases and ever since then, the pandemic has become a global phenomenon which started spreading from country to country as an invisible enemy. A critical evaluation revealed that the pandemic has jeopardized the production of data central to the achievement of SDGs, thus creating serious data gaps in assessing country-level programs towards the SDGs. Several of the 169 targets set by SDGs for achievement in the year 2020 remains unachieved at moment, hence anticipating an ambitious-like posture for the 2030 agenda.

Adaptive outlier treatment technique for rapid recovery in time series

Format: CPS Abstract

Author: Mária Pécs

Co-Authors:

  • Gábor Lovics
  • Beáta Horváth
  • Anett Mikuláné Popovics

Due to the COVID-19 and some other unexpected events new solutions are needed to handle atypical periods in time series. Outliers in seasonal adjustment need to be developed to manage these new situations properly.

Administrative data versus survey data for the production of financial statistics: a cross sectional analysis

Format: CPS Abstract

Author: Dr Sagaren Pillay

In South Africa the production of official financial statistics is currently based on survey data. The research highlights the challenges of using administrative data and presents a cross sectional analysis of unit data on turnover and other key financial variables obtained from an administrative source and data collected by a survey.

Advances in diseases mapping and capacity building in Biostatistics in Sub-Saharan Africa countries

Format: CPS Abstract

Author: Prof. Ngianga-Bakwin Kandala

Sub-Saharan Africa faces a high disease burden in communicable diseases and an increasing burden in non-communicable diseases with a strong spatial and temporal structure. More recently, increased funding for research from donor initiatives has generated high-quality household data volume, but there is a high demand for biostatisticians to analyse these data locally and quickly resulting in a lack of capacity for advanced data analysis. Globally, the fields of geographical epidemiology and public health surveillance have benefited from combined advances in hierarchical model building and in geographical information systems. Exploring and characterising a variety of spatial patterns of diseases at the disaggregated fine geographical resolution has become possible. Donor funded initiatives exist to address the dearth in statistical capacity, but few initiatives have been led by African institutions. The Sub-Saharan African Consortium for Advanced Biostatistics (SSACAB) aims to improve biostatistical capacity in Africa according to the needs identified by African institutions, through collaborative masters and doctoral training in biostatistics. In this special contributed session, we will bring together the work resulting in our effort to build capacity in recent developments in diseases mapping (both Bayesian and frequentist) and especially the classes of Bayesian hierarchical space time models that has been used to characterize the patterns of communicable and non-communicable disease burden in SSA through SSACAB and behind.

Adversarial Outlier Detection

Format: CPS Abstract

Author: Dr Tahir Ekin

Outlier detection methods typically assume clean and legitimate data streams. However, adversaries may attempt to influence data which in turn may impact outlier designations. This paper presents a decision theoretic approach for outlier detection in adversarial environments. Proposed adversarial risk analysis based framework allows incomplete information and adversarial perturbations on the data inputs. We solve the adversary’s poisoning decision problem where he manipulates batch data inputted into outlier detection methods. We discuss potential defender strategies to improve the security of existing frameworks.

Agricultural Credit in Sumatera: Way Forward for National Economic Recovery

Format: CPS Abstract

Author: Mr Hilman Hanivan

Co-Authors:

  • Hilman Hanivan

Banking intermediation should be reinforced through agricultural credit distribution to support agriculture sector playing its major role in the national economic recovery amid pandemic. Using aggregate and individual data, this study identifies several strategic issues of the agricultural credit in Sumatera during the period of 2018-2022. Specifically, this study discusses the determining factors of agricultural credit, the quality of agricultural credit distributed, and the source of credit for agricultural household. The result shows that lending rate and third-party fund have stable influence in both period of pre-pandemic and pandemic towards agricultural credit, while the association between farmer’s term of trade and agricultural credit is only significant in the pandemic period. Moreover, third-party fund and farmer’s term of trade are also the determining factors of the credit’s quality measured by nonperforming loan (NPL) ratio but only during the pandemic. Last but not the least, financial inclusion is the key driver of agricultural household’s decision to borrow from bank. Based on those results, this study has several policy implications.

Alone or together? Work modality, self-efficacy, and accomplishment in problem-solving in probability theory and statistics

Format: CPS Abstract

Author: Dr Sigal Levy

Co-Authors:

  • Yelena Stukalin
  • Dr. Shulamit Geller

Collaborative learning is a field of knowledge stating that knowledge can be created and acquired collaboratively through social interaction in small groups. Research shows that learning in small groups is related to better learning outcomes and a positive impact on cognitive, meta-cognitive, emotional, motivational, and social aspects, as compared to individual learning. In this study, we aimed to test the effect of work modality (individual or in small groups) on self-efficacy and internalisation of the course material, within a framework of a basic undergraduate course in statistics for psychologists. This was tested on two separate occasions, relating to two different topics included in the course curriculum. Our study population were first-year undergraduate students of psychology in a community college in Israel. While this course is considered crucial to their academic knowledge, it is often perceived by the students as difficult and even intimidating. In each trial, the students were asked to perform a task related to the current topic they were studying. The "Descriptive Statistics" task included defining and describing variables. Some students were asked to form groups of 4-5 students and do this together, while others were asked to compose a list of 10 people they knew and perform the task with reference to this group. The "Basic Probability" task included defining events and calculating probabilities, performed in the same manner – either within a group or in reference to an individual list each student composed. Following each task, the students were asked to complete a short quiz and to rate their self-efficacy on a 5-point Likert scale. The quiz was graded to yield a total score. Results show that working in groups was related to better performance in the tasks. Furthermore, we found that self-efficacy moderated the relationship between work modality and quiz score in the Basic Probability task. Specifically, we found that students with lower self-efficacy benefit from group work, while the accomplishments of students with high self-efficacy are not affected by the work modality. These conclusions strengthen the need for "tailored" mode of learning among students with different perceptions of their abilities. Further research is called for regarding the conditions under which group work is beneficial, possibly among students from various fields of study.

Alternative Sample Weighting Procedures of Household surveys: a comparison of calibration approaches used in National Socio-Economic Survey

Format: CPS Abstract

Author: Miss Fenanda Dwitha Kurniasari

Co-Authors:

  • Yulia Atma Putri

<p>Statistics Indonesia, as National Government Office, has conducted National Socio-Economic Survey to collect multidimensional indicators depicting Indonesia's socio-economic welfare and other substantial indicators, also known as mother of survey. Designed for estimating aggregates populations up to Province and Regional Levels, this large–scale survey apply probability sampling with total sample of approximately 320.000 households. Regarding of sampling design used and the significant information obtained from this survey, Statistics Indonesia has performed basic procedures for weighting the sampled data, such as initial/base weight calculation, adjustment for non-response, adjustment for non-coverage households, calibration, and weight trimming. Nevertheless, in line with advancement in science and technology, improvements in the sampling field are necessary, which NGOs must adapt.&nbsp;Nowadays, calibration approaches have evolved into several equation models that address problems in the distribution of sampled data. Basically, calibration was introduced by Sarndall and Deville as statistics procedures are able to generate vectors (g-weight) satisfying the smallest distance function. In this paper, we will evaluate calibration approaches performed in the weighting process, such as linear/GREG, truncated, multivariate raking ratio, and logit. This study attempts to compare result weighting among calibration techniques, which one is the best approach. Resampling is performed on National Socio-Economic Survey datasets repeatedly, generating numerous sample sets of data used as datasets in this simulation study, which adopts Monte Carlo’s simulation. Thus, sample sets would be evaluated by its estimates and standard error of several key variables selected, such as employment status, school participation, internet access, health, and access to social welfare programs. Results found using linear approaches could estimate indicator accurately than other approaches, while it is less precise. Furthermore, estimates of the truncated approach is the most precise, and slightly different from others. Based on the distribution of weights, it may conclude that truncated and logit are more well distributed than others because we restrict the bound of ­g-weight and reduced the extreme value of weights. Therefore, a truncated approach is recommended in the weighting process for this survey data, followed by linear as another alternative approach.</p>

An Aggregate Test for Polynomial Frequency Modulation Using Multitaper Methods

Format: CPS Abstract

Author: Mr Benjamin Ott

Co-Authors:

  • Dr. Wesley Burr
  • Dr. Glen Takahara

We propose a semiparametric multitaper test for the detection of modulated line components where the modulation is assumed to be created by a polynomial of degree P. We derive an approximate distribution for this aggregated test and discuss a simulation study of its performance. As well as comparisons to other known tests.

An MMAP to model a complex warm standby system with a repair facility vacation policy

Format: CPS Abstract

Author: Prof. Juan Eloy Ruiz-Castro

Co-Authors:

  • M. Dawabsha

A Markovian arrival process with marked arrivals (MMAP) is developed to model a complex warm multistate system subject to multiple events in an algorithmic way. Units are exposed to repairable and non-repairable failures, wear failures and shock are also introduced with multiple consequences. Preventive maintenance is included and then three different tasks are possible at the repair facility. A vacation policy is included in the model. The results are obtained algorithmically and computationally.

An analysis of the dual burden of childhood stunting and wasting in Myanmar: a copula geoadditive modelling approach

Format: CPS Abstract

Author: Prof. Dhiman Bhadra

Co-Authors:

  • Dhiman Bhadra

In this study, we explore the spatial variation of childhood stunting and wasting across regions of Myanmar and quantify their association with various socio-economic and demographic risk factors while accounting for the dependence between the two measures of undernutrition. Analysis was carried out on data obtained from a nationally representative sample of households from the Myanmar Demographic and Health Survey conducted during 2015-2016. Childhood stunting and wasting are used as a proxies of chronic and acute childhood undernutrition respectively. A child with standardized height-for-age Z score (HAZ) below -2 is categorized as stunted while that with a weight-for-height Z score (WHZ) below -2 as wasted. It was observed that stunting and wasting had significant within and between-region spatial variation across Myanmar and had significant non-linear association with covariates like child's age and maternal weight-for-age and height-for-age Z scores. The study also revealed a mild positive association between stunting and wasting across regions. In general, child gender, ethnicity, maternal working status and household wealth quintile had significant association with childhood stunting while wasting was impacted by child gender, maternal working status, household location and wealth quintile. This is possibly the first study that provides data-driven evidence on the spatial variation of childhood stunting and wasting across regions of Myanmar as well as their association patterns with relevant risk factors. The resulting spatial maps and estimates can aid in the formulation and implementation of targeted, region-specific interventions towards improving the state of childhood undernutrition in Myanmar.

An empirically adjusted weighted ordered p-values method for meta-analysis in large-scale simultaneous hypothesis testing

Format: CPS Abstract

Author: Dr Sinjini Sikdar

Co-Authors:

  • Wimarsha Jayanetti

Recent developments in high throughput genomic assays have opened up the possibility of testing thousands of genes simultaneously. With the availability of the vast amounts of public databases, one can easily access multiple genomic study results where each study comprises of significance testing results of thousands of genes. Researchers, nowadays, tend to combine this genomic information from these multiple studies in the form of a meta-analysis. But the distributional assumptions of the existing meta-analysis methods can be violated in large-scale simultaneous hypothesis testing leading to incorrect significance testing results. Therefore, there is a pressing need to develop new meta-analysis methods incorporating the large-scale aspect of the genomic studies.

Anaemia in Preschool aged children in the DR. Congo: Finding from a Nationally Representative Survey

Format: CPS Abstract

Author: Dr Ngianga II Kandala

Co-Authors:

  • saseendrah Pallikadavath
  • Jose Jose Kandala

Anaemia, a condition in which there is a reduced haemoglobin concentration is a serious worldwide public health problem. Recent evidence suggests that 30% of the world population is anaemic [1]. Children aged between 0 and 5 years and women of reproductive age are the most affected. The latest World Health Organisation anaemia estimates suggest a global prevalence of 40% in children aged between 0 to 5 years, and 30% in women, and 60% in children aged between 0 to 5 years in Africa [1]. Anaemia in children is defined as a haemoglobin concentration of &lt;11dl at sea level [2]. Anaemia weakens the person’s immune system exposing the individual to infections. In children anaemia is associated with long-lasting developmental effects [3]. Even at its mild stage anaemia can impair cognitive development of children. This retrospective cross-sectional study uses nationally representative data, the 2014 Democratic Republic of Congo Demographic Health Surveys (DRC- DHS) to study potential risk factors of anaemia in children in the DRC. Geographical disparities were explored and presented using maps. Using three-levels random effect models, potential factors were grouped into individual, maternal, household, and community factors to account for observed and unobserved anaemia contributing factors. The data suggest that anaemia in children is a severe public health concern in the DRC (60% prevalence rate) and there are significant disparities in anaemia prevalence within provinces in DRC, and anaemia significantly varies by households. The age of the child, malaria, whether the mother is anaemic, whether the mother was given drugs for intestinal parasites(hookworm), household wealth and the source of drinking water are factors associated with anaemia in children in the DRC. This study is evidence-based on representative country data. Significant associations between anaemia and other infections including malaria and hookworm call for appropriate interventions to reduce haematological deleterious impact of both infections. The results highlight the need to utilise socio-ecological approaches to reduce the burden of anaemia in children in the RDC.

Analysis of Korean regional economic resilience after COVID 19 pandemic

Format: CPS Abstract

Author: KISEOK SONG

Co-Authors:

  • KISEOK SONG

This study measured the resilience of the Korean regional economy after the COVID-19 pandemic and analyzed the determinants. The resilience index, which measures whether the local economy resisted and recovered after an external shock, was measured using employment data in terms of engineering resilience, which has an advantage in quantitative research. A regression analysis was conducted by applying a spatial analysis model, taking into account that the resilience of the local economy may have a correlation between adjacent regions.

Analysis of Middle East Respiratory Syndrome Corona Virus MERS-CoV Data Based on Parametric Cox Model

Format: CPS Abstract

Author: Dr Faiz Elfaki

Survival analysis is a collection of statistical approaches for data analysis when the outcome variable of interest is the time until an event occurs. In the survival data, subjects are expected to experience an event over follow-up. For instance, for some subject the event of interest will be a failure or another event. The competing risks model will occur when more than one event occurs at the same time. This paper will use a parametric Cox model based on Middle East Respiratory Syndrome Corona Virus (MERS-CoV). The performance of the parametric model will be compared with the semi-parametric Cox model and will be investigated based on the simulation data.

Analysis of Price Transmission along the Rice Supply Chain in Indonesia

Format: CPS Abstract

Author: Chindy Saktias Pratiwi

This study aims to investigate asymmetric price transmission along the supply chain in Indonesian rice market (farmer, producer, wholesaler, retailer), and use the APT information to analyze the indication of welfare transfer. NARDL is used to see the difference in speed and magnitude of price transmission. The empirical results show the presence of positive APT along the supply chain, meaning that the response to price increases is greater than price decreases. There are indications of consumer welfare loss and wholesalers getting the greatest benefit from their dominance in creating additional profits from price changes triggered by the farmer level.

Analysis of a Grade 5 Reading Comprehension Assessment: Examining Item Invariance and Performing Test Equating

Format: CPS Abstract

Author: Karizza Bianca Loberiza

Co-Authors:

  • Kevin Carl Santos

The Assessment, Curriculum, and Technology Research Centre (ACTRC) administered an English Reading Comprehension Test to Grade 5 students in two Philippine regions in 2018 and 2019. To ensure that the achievement scores of the students in both groups are comparable (i.e., on the same scale), ACTRC intends to analyze them using linear equating methods for nonequivalent groups with anchor test. Thirty-two common items were administered to both groups of students and the Rasch model is fitted to the item response data. Before proceeding to test equating, differential item functioning (DIF) analysis using Wald test is employed to detect possible violations of item invariance between the two groups of students. Preliminary results reveal that four common items are flagged as DIF items. Careful examination of these items is conducted to determine possible item bias. Afterwards, three equating methods, namely, concurrent scaling, shift and scale, concurrent, and anchoring methods are employed and compared with each other to equate the two assessments based on the remaining common items.

Analyzing retrospective neonatal mortality data using mixture cure rate models

Format: CPS Abstract

Author: Tamanna Howlader

Co-Authors:

  • Zannatul Ferdous Asha

Studies of neonatal mortality determinants often rely on retrospective data since large-scale follow-up data are generally not available. A characteristic of neonatal data collected retrospectively from cross-sectional studies is that births and deaths are recorded within a fixed interval that is long compared to the risk period (first 28 days of life). Furthermore, a substantial proportion of subjects never die due to neonatal causes and survive past the risk period. These subjects are cured and therefore complete observations. The Cox proportional hazards (Cox PH) regression model has been a popular tool in the study of causes of neonatal death. It assumes that given a long enough follow-up time, all observations will eventually experience the event of interest. This assumption may not hold for retrospective neonatal mortality data. Thus, the use of standard Cox PH model that treats cured observations as censored may not be appropriate. In such a case, the mixture cure rate model (MCRM), which models the cure rate (incidence part) and survival time (latency part) separately may be a better alternative.

Analyzing the Effects of Feeding Practices, WASH, and District-Level Spatial Covariates on Malnutrition Among Indian Children: Aiming for SDGs 6 and 3

Format: CPS Abstract

Author: Dr RICHA VATSA

Co-Authors:

  • Umesh Ghimire

The UN sustainable development goals (SDGs) 6 and 3 address affordable and safe drinking water, proper sanitation and hygiene (WASH) practises, and supporting healthy lifestyles and well-being at all ages. Previous studies have shown that inadequate drinking water, sanitation, and  hygiene (WASH) practises, as well as improper feeding practises, are the major causes of childhood malnutrition in low and middle-income countries. As per the National Family Health Survey (NFHS)-V, the prevalence of nutritional status-stunting, wasting, and overweight, among Indian children were found to be quite high and geographically varied. Despite the efforts of Indian government to improve WASH facilities across the country, the prevalence of malnutrition has increased in many Indian states since the previous NFHS. Therefore, this study focuses on incorporating feeding practises among children, along with the WASH indicators, district-level spatial covariates, and some other socioeconomic and demographic maternal and household characteristics as explanatory variables. The study employs a Bayesian structured additive regression model to examine the influence of these explanatory variables on childhood nutritional status. For this study, NFHS-5 2019-21 data (KIDS file) collected from www.DHSprogram.com is considered and analysed using the software package BayesX via the R interface.

Anomaly Detection Using LSTM Autoencoder Network In Ensuring Reliable Operation At Offshore Utilities System

Format: CPS Abstract

Author: Dr Afidalina Tumian

Co-Authors:

  • M Hafrizal Azmi
  • M Zakwan B M Sahak

Development of unsupervised machine learning model for anomaly detection, applied at an oil and gas facility

Anonymization for integrated and georeferenced Data (AnigeD)

Format: CPS Abstract

Author: PROF. DR. Markus Zwick

Statistical Disclosure Control for integrated and georeferenced data is a new challenge for statistical institutes. New digital data in combination with traditional data offer many new analysis possibilities. Moreover, these complex datasets are usually georeferenced in a very fine-grained way. Traditional confidentiality procedures reach their limits here. Destatis, the German Statistical Institute, is working together with various universities on the further development of existing procedures in order to ensure the protection of individuals even for complex data. The lecture will present the first results of the project funded by the German Ministry of Research.

Application of Geographically Weighted Regression for studying natural gas consumption and price elasticity from microdata

Format: CPS Abstract

Author: Prof. Luigi Grossi

Co-Authors:

  • Filippo Favero

We consider quarterly billing and distribution data from 2016 to 2018 for a sample of consumers located in Veneto (Italy) and apply a two-stage technique to study price elasticity and consumption determinants. In the first stage, natural gas price is simulated with the Geographically Weighted Regression (GWR) technique. In the second stage, conventional panel data models are applied to natural gas consumption finding price elasticities for the simulated prices. We obtain consistent results with more complicated models requiring external information and assumptions.

Application of PCA and K-Means clustering to detect Autistic Spectrum Disorder

Format: CPS Abstract

Author: Prof. Mohsen Farid

Co-Authors:

  • Sarwat Qureshi

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that manifests in problems related to communication, social skills, and repetitive and stereotypical behaviors. Caregivers, psychiatrists, and clinicians carry out the screening/diagnosis process using gold-standard tools. These tools are often criticized for being too lengthy and consuming a long time to analyze. The existing screening tools, such as Quantitative Checklist for Autism in Toddlers (Q-CHAT) heavily depend on the sum score of all items as a measure to evaluate the symptoms and severity of ASD among toddlers, adolescents, and adults. These handcrafted rules of the cut-off score for screening ASD are subjective and, therefore, open to debate. Therefore, improving the screening process for ASD and making it accessible to users becomes paramount.   This paper aims to detect ASD symptoms in toddlers using Principal Component Analysis (PCA) and K-means, both robustly and quickly using Q-CHAT. It further provides recommendations for determining the severity of ASD in toddlers. A relatively large dataset employed in the study consists of 1016 children aged 16 to 36 months. The dataset contains four groups: (1) Toddlers who are typically developing, (2) Toddlers whose parents report ASD-specific concerns, (3) Toddlers at risk for autism due to having an older sibling with ASD, and (4) Toddlers with a developmental delay. PCA has identified a reduced dimensional space of questions in Q-CHAT that performs almost as well as the original questionnaire to detect ASD. PCA has effectively identified that there is no difference between groups (2), (3), and (4) using a reduced set of Q-CHAT items. Results also suggest no gender difference, as reported in some literature. The K-Means clustering was employed to detect ASD/No-ASD from the dataset. The findings reveal that K-means has effectively (1) distinguished toddlers who are autistic from those who are not. The findings indicate that depending on only the sum score cut-off of all the questions in the original instrument is not the best way to identify toddlers with ASD.

Application of Para Data analysis to remote quality improvement in tablet-based surveys

Format: CPS Abstract

Author: Mr Saeed Fayyaz

Co-Authors:

  • Arash Fazeli

Official statistics and quality insurance are to main pillars of this paper and official statistics or quality assessment sessions can be fit to this paper.

Application of machine-learning Techniques to analyze and estimate regional SDG indicators in Morocco

Format: CPS Abstract

Author: Miss Ghizlane BOUAYAD

Co-Authors:

  • Adil Ez-zetouni

The Sustainable Development Goals (SDGs) are a set of targets and indicators based on data that are intended to enhance national reporting and policy. Since the SDGs are linked together, progress made toward one of them may have positive (synergies) or negative (trade-offs) effects on the other goals. The intricacy of SDGs as a system is highlighted by this kind of connection. To maximize goal achievement, it can be helpful to discover goals that have a beneficial effect on other goals. Indeed, having data on a national or regional scale is still a constraint for analyzing and monitoring the SDGs in developing countries. This study proposes a technical framework to estimate, predict and analyze regional SDGs targets in Morocco. It leverages the capabilities of machine learning (ML) algorithms on available official data and alternative data to estimate proxy indicators and therefore to identify synergistic SDGs, analyze the interactions between SDGs at regional level in Morocco.

Application of statistics in the business industry: The perspective of small scale business enterprises in Ghana

Format: CPS Abstract

Author: Prof. Bashiru I.I. Saeed

Co-Authors:

  • Ebenezer Tawiah Arhin
  • Amidu Abdul Hamid
  • Caleb Nambyn

The business sector is a crucial component of modern society. It is a planned and clever channel for making money. It covered everyone's efforts in achieving a common economic objective. Such a business flourishes when the business environment is favorable for business organization, which lessens poverty. Businesses carefully examine data and statistics to determine what they are doing well and what is profitable for the business, as well as to identify what needs to be changed or urgently attended to if things are not going well. As a result, the objective of this study is to explore how statistics are used to deliver high-quality goods and services that are profitable and to determine how often business industry participants reference statistical media while formulating business plans.

Approximation of the Formal Bayesian Model Comparison using the Extended Conditional Predictive Ordinate Criterion

Format: CPS Abstract

Author: Mr Md Rashedul Hoque

Co-Authors:

  • Paul Gustafson

The purpose of the model comparison is to find a useful model among some possible models. There are different model selection methods available in the literature, developed by the frequentist and Bayesian schools. The main goal of this study is to examine a model selection method based on cross-validation as an alternative to the formal Bayesian model comparison.

Assessing the Accuracy of Paddy Harvested Area Prediction from the Area Sampling Frame Survey Results and the Alternative Forecasting Methods

Format: CPS Abstract

Author: Octavia Rizky Prasetyo

Co-Authors:

  • Kadir

Since 2018, Statistics Indonesia (BPS) has implemented the Area Sampling Frame (ASF) method to estimate the paddy harvested area in Indonesia. This method upgraded the traditional so-called eye-estimate method, which allegedly produced overestimated figures of paddy harvested area. One of the advantages of the ASF method is its ability to yield the upcoming three months prediction of the paddy harvested area by observing the state of the paddy growing phase at the current month, which can give valuable information for policy formulation. This study aims to evaluate the accuracy of those predictions and proposes an alternative method for forecasting. We use Mean Absolute Percentage Error (MAPE) for the evaluation and apply a Hierarchical forecasting method under various reconciliation and forecast methods for the latter.

Assessing the agreement between multi-operator measurement systems using the probability of agreement

Format: CPS Abstract

Author: Dr Adel Ahmadi Nadi

Co-Authors:

  • Nathaniel Stevens
  • ‎Stefan H‎. ‎Steiner

The probability of agreement (PoA) was recently introduced to assess the measurement systems' agreement. In this presentation we develop the required methodology for applying PoA when the influence of operators is separately accounted for in the measurement error model. The methodology is illustrated on a dataset of respiratory rates of Chronic Obstructive Pulmonary Disease (COPD) patients. The proposed PoA analysis is used to assess the agreement between a gold standard device and a chest-band.

Assessing the between-country genetic correlation in maize yield using German and Polish official variety trials

Format: CPS Abstract

Author: Waqas Ahmed Malik

Co-Authors:

  • Waqas Ahmed Malik
  • Hans-Peter Piepho

We assess the genetic gain and genetic correlation in maize yield using German and Polish official variety trials. The random coefficient models were fitted to assess the genetic correlation. Official variety testing is performed in many countries by statutory agencies in order to identify the best candidates and make decisions on the addition to the national list. Neighbouring countries can have similarities in agroecological conditions, so it is worthwhile to consider a joint analysis of data from national list trials to assess the similarity in performance of those varieties tested in both countries. Here, maize yield data from official German and Poland variety trials for cultivation and use (VCU) were analysed for the period from 1987 to 2017. Several statistical models that incorporate environmental covariates were fitted. The best fitting model was used to compute estimates of genotype main effects for each country. It is demonstrated that a model with random genotype-by-country effects can be used to borrow strength across countries. The genetic correlation between cultivars from the two countries equalled 0.89. The analysis based on agroecological zones showed high correlation between zones in the two countries. The results also showed that 22 agroecological zones in Germany can be merged into five zones, whereas the six zones in Poland had very high correlation and can be considered as a single zone for maize. The 43 common varieties which were tested in both countries performed equally in both countries. The mean performances of these common varieties in both countries were highly correlated.

Audit sampling inference for official survey statistics

Format: CPS Abstract

Author: Luis Sanguiao-Sande

Co-Authors:

  • Li-Chun Zhang

"Wherever the goal of survey sampling is to produce a point estimate of some target parameter of a given finite population, audit sampling aims not to estimate the target parameter itself, but some chosen accuracy measure of any given estimator of the target parameter, which may be potentially biased due to failure of the underlying assumptions or other favourable conditions that are necessary" (Zhang, 2021).We consider auditing inference of the prediction estimator, either from a sample trained model or given in advance from register data.

BOOME: A boosting method for variable selection and estimation with measurement error in binary responses and predictors

Format: CPS Abstract

Author: Dr Li-Pang Chen

In this talk, I am going to present variable selection and estimation for error-prone responses and predictors. Unlike conventional regularization methods, such as lasso, this work adopts the boosting procedure to iteratively retain informative predictors and obtain the final estimators. In addition, the other challenge comes from measurement error. In this talk, I am going to present some useful strategies to handle measurement error effects and derive an unbiased estimating function. To justify the proposed method, I will present some theoretical properties, including consistency and the asymptotic normality of the estimators. Finally, I will demonstrate some simulation studies and show the validity of the proposed method as well as the impact of ignorance of measurement error effects.

Basic Food Basket Price From Big Data

Format: CPS Abstract

Author: Mr Marcus André Alves Zimmermann Vieira

Co-Authors:

  • Andrea Diniz da Silva

Session about alternative sources or big data

Bayesian estimation in scale mixture of skew-normal linear mixed models using Hamiltonian Monte Carlo

Format: CPS Abstract

Author: Dr Fernanda Lang Schumacher

Co-Authors:

  • Larissa Avila Matos
  • Victor Hugo Lachos
  • Francisco Louzada Neto

Linear mixed models are commonly used to model clustered data and usually assume that both random effect and error term follow normal distributions, which could lead to invalid statistical inferences. A more flexible model is reached by considering the scale mixture of skew-normal class of distributions. However, the maximum likelihood estimate of the parameter which regulates skewness may diverge. In this section, this anomaly is illustrated, and an alternative Bayesian estimation is proposed.

Bayesian latent class model: A framework to study the prevalence and the performance of algorithms in bot detection

Format: CPS Abstract

Author: Dr bekir çetintav

In this study, a real-time bot detection application made on one of Turkey's leading ad listing websites is included. The results show that Bayesian Latent Class Model, which is used successfully in the field of epidemiology to determine the performance of methods for the detection of real viruses and prevalence in the absence of a gold standard, can also be used in the field of web bot detection.

Bayesian methodology for product ranking considering a positive and a negative reference in network meta-analysis

Format: CPS Abstract

Author: Mr Clément Laloux

Co-Authors:

  • Hussein Jouni
  • Arnaud Monseur

This presentation proposes a new Bayesian methodology allowing to rank products based on a single measurement. It can be applied to multiple domains involving different kinds of efficacy criteria. It is primarily intended and defined to be used in a meta-analytical framework. It aims to rank products tested in different Randomized Clinical Trials (RCTs), compared to a positive and negative reference, and according to homogeneous protocols.

Bayesian modeling of health state preferences: can existing preference data be used to generate better estimates?

Format: CPS Abstract

Author: Dr Samer A Kharroubi

Valuations of preference-based measure such as EQ-5D or SF6D have been conducted in different countries because social and cultural differences are likely to lead to systematically different valuations. However, there is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately.

Beauty or the Beast? Flipped Classroom for Statistics Education

Format: CPS Abstract

Author: Dr Elinor Mair Jones

Co-Authors:

  • Ayse Aysin Bilgin
  • Simon Harden
  • Huan (Jaslene) Lin
  • Peter Fitch

COVID-19 presented an opportunity in Higher Education to rethink the delivery of statistics courses. Many instructors turned to a ‘flipped learning’ approach, where students study discipline content through pre-recorded lectures before attending discussion classes which can be centered around student questions. Though lauded as superior to lecture-based instruction, we consider the reality of this method of learning and present evidence to the contrary.

Big Data Governance Framework: The Case of Mobile Positioning Data for Official Tourism Statistics In Indonesia

Format: CPS Abstract

Author: Mr Eko Rahmadian

Co-Authors:

  • Yulia Virantina
  • Daniel Feitosa
  • Andrej Zwitter
  • Titi Kanti Lestari

The National Statistics Office (NSO) of Indonesia (Statistics Indonesia) has been using Mobile Positioning Data (MPD) to measure cross-border foreign visitors since 2016. Indonesia is one of the few countries that have already used MPD as one of Big Data sources for official statistics products, as it provides more accurate data with better coverage and timeliness on tourism arrival compared to the traditional method. Following its success, since 2018, research on the potential use of MPD has been expanded to other purposes, such as measuring domestic tourism and people's mobility in metropolitan areas. Not only the MPD, but research on the potential use of other Big Data sources for official statistics has also increased response to the demand by related stakeholders and decision-makers. Following that, also with the development and application of Big Data in various sectors and purposes, including official statistics, the role of Big Data governance is becoming more and more important. Big Data governance is a holistic approach that allows harmonization of people, methods, tools, and technologies to deal with structured and unstructured data. Big Data governance is also a new data stage in the development of data governance, especially exploring its theory and practice to improve organizational data management and utilization. Currently, despite the current success on the use of MPD, there are some challenges regarding Big Data governance that have possibly become threats to data sustainability and the entire data provision process. In this paper, we aim to investigate the issues and challenges of Big Data governance on the case study of MPD for official tourism statistics in Indonesia. Our research aims to identify challenges in the dimensions of Big Data governance framework, specifically in addressing issues on the role and communication among stakeholders, institutions or organizations, data quality, and regulatory compliance. To that aim, we conducted a field study in Statistics Indonesia, consisting of semi-structured interviews with related stakeholders. We also conducted a comparison study with Centraal Bureau voor de Statistiek Nederland (Statistics Netherlands) to have more perspective from other NSO that we considered quite established and reachable for our study. Through the result findings of our qualitative research on the MPD case study, we expect both to provide more insight and understanding of the urgency of Big Data governance, and to propose the importance of a Big Data governance framework for official statistics. keywords: Mobile Positioning Data, Big Data governance framework, tourism statistics, official statistics, data sustainability

Big Data Opportunities arising from the new Data Ecosystem

Format: CPS Abstract

Author: Mr Md Shariful Islam

Big Data is the byproduct of technological revolution. This technology continuously generate data, faster and more detail or National Statistical System around the world. Reason the experiences on Big Data for conceal the data gap need official Statistics. In new data ecosystem there needs to be strong partnership between official statistics and other actors including data science and artificial intelligence

Big Data Opportunities arising from the new data ecosystem

Format: CPS Abstract

Author: Mr Md Shariful Islam

A data ecosystem is a collection of infrastructure, analytics, and capture and analyze data. Data ecosystem serving today’s modern enterprises is a multiplatform architecture that attempts to embrace a variety of heterogeneous data sources. So emerging data ecosystem can make partnerships and coordination for mobilizing power of data and the world. Big Data ecosystem have emerged curate, store, produced, clean, and transact data. In addition, the need for applying more advanced analytical techniques to increasingly complex business problems have driven the emergence the new roles that address these needs. The Big Data Ecosystem follows key role Deep Analytical talent that is savvy, with strong analytical skills. In this case Members possess a combination of skills to handle raw, unstructured data and to apply complex analytical techniques at massive scales. This group has advanced training quantitative discipline such as statistics and Machine learning. To do their job, statistician can access to a robust analytical sandbox or workplace and play important role to navigate successfully and sustain public roles. Big Data Ecosystem tends to have a basic knowledge of working with data, or an appreciation for some of the work being performed by Data Scientist and others with deep analytical talent. Part of Data Ecosystem by working with partnership data savvy professionals including financial analysis, market research analysts, life scientists, operation managers, business and functional managers make success of partnerships and coordination’s emerging the Data Ecosystem. Data Ecosystem perform a structural analysis of the business model. It understands the market trends and customer needs. This help to business requirements. It widely uses Machine Learning for building systems and models on top of the dataset prepared. Data Scientists use Machine Learning algorithms and techniques to build models. The organizations (resources) use these models to fulfill their business requirements by using Data Ecosystem.Statistician can play vital role in case of data ethics. They predict validity of data and mthod Based on data that are validated by statistician may be utilized for better decision making.Big Data is the byproduct of technological revolution. This technology continuously generate data,  faster and moe detail or National Statistical System around the world. Reason the experiences on Big Data for conceal the data gap need official Statistics. In new data ecosystem there needs to be strong partnership between official statistics and other actors including data science and artiicial intelligence.

Big Data Transformations in the Philippine Retail Industry

Format: CPS Abstract

Author: Mr Dominic Dayta

Co-Authors:

  • Jaime Syjuco

While industries in the Philippines have long initialized readiness and adoption of big data, recent challenges introduced by the COVID-19 pandemic have pushed companies into this transformation at breakneck speed. This paper presents a case study of consulting work provided for a major retail business in the Philippines, and the value and insights gained from bespoke models trained on high dimensional data collected by the company.

Big Data and Data Science

Format: CPS Abstract

Author: Mr Md Shariful Islam

Data Science is the field that comprises of everything that is related to data cleansing, data mining, data preparation, and data analysis. Big data refers to the vast volume of data that is difficult to store and process in real time. This data can be used to analyze insights which can lead to better decision making.

Big-data Feature Analysis in a Cyclostationary Model Framework

Format: CPS Abstract

Author: Dr Francois Marshall

When a harmonic signal appears in noise and the residual noise is red, this may be evidence for residual periodicity, and a simple model for this is to assume the presence of normal modes in the power spectrum. As long as spectral power does not diverge too rapidly approaching zero frequency, it may also be true that the noise finite-dimensional distribution (FDD) exhibits normality. Here, two novel visualization diagnostics are introduced that make use only of the power spectrum to characterize the noise FDD. The first diagnostic reveals the quality of the harmonic periodicity in both the mean and spectral-power signals, while the second reveals the extent to which the joint distribution of the multitaper Fourier-transform processes exhibits spherical, proper, Gaussian behaviour. It is shown for a two-sample survey of epileptic-seizure, microelectrode-voltage time series how these visualization techniques explain time-evolutionary periodicity for long records in a manner more efficient than a spectrogram due to invariance of the diagnostic metrics to linear filtering.

Blended Learning Design for Teaching Data Science in Moroccan Universities

Format: CPS Abstract

Author: Miss Sofia Bourhim

Co-Authors:

  • Peter Ohue

Blended learning has recently sparked a lot of interest in the Moroccan educational research and system. However, blended learning was rarely used in classrooms in Morocco. It only started to be applied as a result of the Covid-19 situation. Initially, the teaching method was entirely presential. The new teaching method combines both modes: presential and distance learning, and it seeks to capitalize on the benefits of both. The purpose of this study is to present the current state and the challenges of blending learning in the Data Science major in Morocco, while showing its importance in improving the students’ skills but also the educational system.

Bootstrap inference in functional linear regression models with scalar response under heteroscedasticity

Format: CPS Abstract

Author: Hyemin Yeon

Co-Authors:

  • Daniel John Nordman
  • Xiongtao Dai

Despite its importance, inference about linear models based on regressors in function spaces have been less studied compared to the finite dimensional setting, particularly in the case of heteroscedasticity. At issue, mean (or projection) estimates have complicated sampling distributions, due to bias and scaling problems from infinite dimensional regressors, which is compounded by effects of non-constant variance. In fact, central limit theorems have not yet been established in this case. We develop a paired bootstrap method to approximate sampling distributions of estimated projections, as well as give a central limit theorem, when the errors have different conditional variances given the regressors. When the paired bootstrap is implemented in a standard fashion though, following the case of finite dimensional regressors, the bootstrap approximation can provably fail. The reason owes to bias from functional regressors in this bootstrap construction. A modified paired bootstrap is applicable, however, for constructing confidence intervals for projections and for conducting hypothesis tests for the slope function. Our theoretical results on bootstrap consistency are demonstrated through numerical studies. The paired bootstrap method is illustrated with real data examples.

Building a record linkage engine for socio-demographic data

Format: CPS Abstract

Author: Mr Lucas Malherbe

Co-Authors:

  • Lucas Malherbe

Résil is the name of the French program aimed at building statistical registers of individuals and housing. Built by combining a set of administrative data sources, they will come alongside a data enrichment service offered to French official statisticians. A record linkage engine is being developed as part of this service. It is a challenging task, as the files to be linked can be very diverse (both in terms of available variables and data quality). This paper covers these challenges as well as the adopted solutions.

Bulk Johnson-Lindenstrauss Lemmas

Format: CPS Abstract

Author: Dr Michael Casey

Instead of requiring all distances between pairs of points to be approximately preserved in the Johnson-Lindenstrauss lemma, we require only a large fraction 1-eta of such distances to be so. For N points, the target dimension is reduced from scaling like log(N) to log(e/eta)log(N)/R. Here, R is the minimum stable rank of certain log(N) sized subsets of difference vectors. For example, a target dimension scaling like log(e/eta) is sufficient for high-dimensional iid standard Gaussian data.

CAViaR models for Value at Risk and Expected Shortfall with long range dependency features

Format: CPS Abstract

Author: Gelly MITRODIMA

Co-Authors:

  • Jaideep Oberoi

We consider alternative specifications of conditional autoregressive quantile models to estimate Value-at-Risk (VaR) and Expected Shortfall (ES). The proposed specifications include a slow moving component in the quantile process, along with aggregate returns from heterogeneous horizons as regressors. Using data for ten stock indices over a period that incorporated the global financial crisis, we evaluate the performance of the models and find that the proposed features are useful in capturing tail dynamics better.

COMPARISON OF THE PERFORMANCES OF THE SUPPORT VECTOR MACHINE AND THE RANDOM FOREST METHOD IN ESTIMATING THE IMPACT OF COVID-19 PANDEMIC ON FOOD SECURI

Format: CPS Abstract

Author: Charles Aronu

Co-Authors:

  • Okafor Emeka Sixtus

The study examines the performance of the Support Vector Machine (SVM) method and the Random Forest (RF) Classification Method in predicting the impact of the COVID-19 pandemic on food security in Anambra State, Nigeria. The measure for food security in this study includes Crop Production (CP), Livestock Production (LP), Forestry Production (FP1) and Fishery Production (FP2) while the demographic predictors considered in the study were Age interval, Marital Status, Gender and Educational Qualification.

COVID-19’s Socio-economic Impact on data on Migrants and refugees - (Egypt Case Study)

Format: CPS Abstract

Author: Dr Haidy Mahmoud

Co-Authors:

  • Haidy Mahmoud

The world has changed dramatically over the past few years, and the conflicts, climate changes ‎and ‎health crises like COVID-19 have become a major challenges of our time, and it is impossible to ignore ‎their ‎effects. As migrants and refugees are among the social groups most affected in these ‎crises, where remittances play a vital socioeconomic role in economies around the world, especially in ‎developing countries where they are the sole source of income for many households, and according to ‎the Organization for Economic Co-operation and Development (OECD), “Egypt is the fifth-largest ‎recipient of remittances in the world. This paper aims to shed light on the repercussions of Covid-19 on International Migration, ‎Remittances, based on some international studies and reports, and then identifies the ‎characteristics of migration and remittances of Egyptians abroad, and which revealed the fragility of ‎Labour markets and the difficulty of sustaining remittances due to health crisis.‎This study uses the descriptive analytical and quantitative statistical approach, ‎to analyze data and statistics related to immigration and cash transfers in Egypt, and then to identify the determinants of the impact of the Covid-19 on migrants, Using some statistical tests, for example: the multiple regression model and the coefficient of determination and correlation to identify the direction of the relationship under study. Moreover, the study uses (SWOT Analysis) to identify the strengths, weaknesses, opportunities and challenges facing Egyptian immigrants in light of the COVID-19 crisis. It based on the World Bank Reports, World Migration Reports, IOM Reports, Migration Data Portal, International Labour Organization, and Central Bank of Egypt indicators. The results of this study show that due to the higher incidence of poverty, housing conditions, and high concentration in jobs where physical distancing is difficult, immigrants are at a much higher risk of COVID‑19 infection than the native-born, Immigrants are potentially in a more vulnerable position in the labour market due to their generally less stable employment conditions and lower seniority on the job, which there is a negative impact on the labor market for immigrants and is increasing more on the sectors most ‎affected by the pandemic, and the school closures and distance learning measures put in place to slow the spread of COVID‑19 put children of immigrants at a disadvantage in several ways.The study recommends that the need to assess the situation of migrants and their children in relation to health, employment and education to better identify issues and appropriate policy, with the need to ensure that migrants have access to testing and treatment for COVID-19, and to ensure that housing and employment conditions for migrants respect health standards in order to avoid the spread of the virus, the data should be fit for purpose. This can challenges when working at this pace.

Cancer Indicators Reporting in Africa: A Review of Availability, Completeness of Data, and Recommendations for The Future.

Format: CPS Abstract

Author: Ms Dorcas Kareithi

Co-Authors:

  • Dorcas Kareithi

According to ongoing studies, Africa is currently facing an exponential increase in cancer incidence rates. These incident rates are predicted to double over the next decade, with the numbers expected to grow to more than a million in the next 5 years. Research on the epidemiology of cancers, improving oncology patients’ experience, palliative care, and data capture in oncology is hitting ground-breaking levels in other continents, but lagging in Africa. As a result, the continent has missed opportunities in making innovative progress toward the prevention, management, and care of cancers. Outdated data systems, non-standardized data capture methods, and limited investment in human resource and infrastructure has been previously reported. This presents findings from a review of cancer-related health indicators data submitted by countries, discussing the availability, completeness of data, and recommendations for the future.

Capital Flight in the Presence High Debt Profile: Insight from Nigeria

Format: CPS Abstract

Author: Dr Yaaba Baba Nmadu

Co-Authors:

  • Abolade, C. Onomeabure

The concern about capital flight from Africa and particularly Nigeria has been on the increase in recent time. Policy makers, analysts and researchers attributed the upsurge to the re-emergence of external debt accumulation (Ndikumana &amp; Boyce, 2021). Analysts are concerned because of the potential negative impact of loss of resources to catalyze the long desired economic growth. Others are worried due to the fear of debt service burden that could worsen the already bad situation of paucity of government revenue. Moreso, considering the view of Rojas-Suarez (1990) on the risks of capital flights namely: “the risk of expropriation of domestic assets and the risk of losses in the real value of domestic assets resulting from inflation or exchange rate depreciations”, others have begun to think that the bolstering capital flight signals the existence of these risks in the Nigerian economy. Historically, there was the perception in the 1970s to early 1980s that “the risk of expropriation of domestic assets” explained to a large extent not only the infinitesimal flight of capital out of the developing countries but also large inflows of capital. The story however changed in the mid-1980s as debt crisis surfaced in the developing countries when capital inflows moderated as outflows spiked. The 1990s brought some relief as capital flight from Africa and particularly Nigeria was low perhaps due to low level of external debt. Analysts have in recent time worried that this trend is rapidly changing. Ndikumana &amp; Boyce (2018) estimated capital flights from thirty African countries from 1970 to 2018 to be around US$2.0 Trillion making them net creditors to the rest of the world. According to UNCTAD (2020) over US$836 billion left the continent of Africa as capital flight from 2000 to 2015. The concurrent episode of external debt pile-up and capital flight in Africa have been well studied in recent time (Ndikumana &amp; Boyce, 2011; Ndikumana, 2015; Ndikumana et al. 2015) but lacking for Nigeria and thus the need for this study. To achieve this objective of exploring the new links between external debt and capital flight for Nigeria, this paper adopts an eclectic approach to estimate the incident with emphasis on the post market reform period of 1986 to 2022. Thereafter, modelled its determinants using the dynamic autoregressive distributed lag (ARDL) approach. The paper is divided into five sections including this introduction. Section 2 reviews past relevant theoretical and empirical literature, while section 3 details the methodology. Section 4 presents the empirical results, and the last section concludes the study. One of the salient contributions of this paper to literature, besides using all the existing techniques of estimating capital flight, is the fusion of those methodologies to yield another measure of the phenomenon (i.e., Yaaba-Onome Synthesis).

Causal rule ensemble method considering the main effect in heterogeneous treatment effect

Format: CPS Abstract

Author: Miss Mayu Hiraishi

Co-Authors:

  • Ke Wan
  • Kensuke Tanioka
  • Hiroshi Yadohisa
  • Toshio Shimokawa

We propose a novel framework of rule ensemble applied to the data to estimate interpretable treatment effects. The proposed method, based on the RuleFit method, has two features. First, the proposed model has a main effect term that estimates the impact on outcomes independent of the treatment effect. If main effects in a true model are present, estimating the treatment effect without their consideration may lead to bias. A second feature of the proposed method is that it uses group lasso to select the same rules. This allows estimation under the same rules across both treatment groups.

Causality in cosmetic research

Format: CPS Abstract

Author: Dr Philippe Bastien

Surprisingly, causality is an emerging field in science today. Although this notion sounds familiar, asking questions in terms of causality until recently could be considered unscientific except for randomized trials. The lack of a mathematical language associated with causality, the generalized use of algebraic equations, or the central role given to correlation since Karl Pearson, thereby denying the need for an independent concept of causality, partly explains this problem. We must see causality as an enrichment of Statistics allowing access to a part of the world that traditional statistical methods cannot approach, revealing paradoxes. In regression, the notion of causality is apparent through the question of regressor selection or adjustment for covariates which continue to be decided informally more on intuition, hence resulting often with no correct solution due to a lack of causal view. Despite the criticisms of their peers, causal approaches have been proposed during the 20th century by geneticists (Segal Wright, 1920), economists (Haavalmo, Wold, 1960), or statisticians (Neyman 1920, Rubin 1970), but we had to wait until the 1980s to see the emergence with the work of Judea Pearl (5-6) of a complete mathematical formulation of causality with the do-calculus associated with a graphical approach. This booming field finds numerous applications, in sociology, justice, medicine, data fusion, problem of transportability, often through the notion of counterfactual. The next big revolution in deep networks will be the consideration of causality for more robust, generalizable, and more easily interpretable models. Based on data from an observational study on more than 200 Chinese women on the effects of pollution covering clinical, biophysical, pollutant, microbiome, or proteomic data (1-4), we show how traditional statistical approaches can be complemented by causal ones. 1)Misra N and all., Multi-omics analysis to decipher the molecular link between chronic exposure to pollution and human skin dysfunction. Sci Rep. 2021 Sep 15;11(1):18302. 2) Leung MHY and al., Changes of the human skin microbiota upon chronic exposure to polycyclic aromatic hydrocarbon pollutants. Microbiome. 2020 Jun 26;8(1):100. 3) Naudin G et al. Human pollution exposure correlates with accelerated ultrastructural degradation of hair fibers. Proc Natl Acad Sci U S A. 2019 Sep 10;116(37):18410-18415. 4) Palazzi P et al., Exposure to polycyclic aromatic hydrocarbons in women living in the Chinese cities of BaoDing and Dalian revealed by hair analysis. Environ Int. 2018 Dec;121(Pt 2):1341-1354. 5) Pearl, J. (2009). Causality. Cambridge University Press. 6) Pearl, J., &amp; Mackenzie, D. (2019). The book of why. Penguin Books.

Cause-specific incidence or cause-specific hazard, that is the question.

Format: CPS Abstract

Author: Tomomi Yamada

Co-Authors:

  • Tsuyoshi Nakamura
  • Hiroyuki Mori
  • Todd Saunders
  • Yoshiaki Nose

Competing risk model

Challenges in creating ecosystem accounts for Canadian ocean areas and coastline

Format: CPS Abstract

Author: Dr Jessica Andrews

Co-Authors:

  • Tasha Rabinowitz

This presentation discusses the challenges of creating ecosystem accounts for coastal and ocean ecosystems.   These accounts are important in that they cover services provided by the environment that are not currently included in economic accounts.

Child Labor Statistics using MICS: Work and Risky Work Estimates for 36 Low- and Middle-Income Countries

Format: CPS Abstract

Author: Anna Bolgrien

Co-Authors:

  • Deborah Levison
  • Deborah DeGraff

In this paper, we will summarize recent global patterns of child labor and domestic work in low- and middle-income countries, including the prevalence of potentially hazardous work, using a newly available data resource, IPUMS-MICS.  In 2005, Eric Edmonds and Nina Pavcnik wrote an influential paper on child labor. Part of their research was based on an analysis of 5-14 year-old children using UNICEF’s Multiple Indicator Cluster Survey (MICS) samples from 36 low-income countries for 2000-2001. They weighted nationally representative participation rates and hours worked to derive aggregate measures which they examined by gender, urban/rural, and age group. Some of their most important MICS findings showed that data about children’s work does not reflect popular images of child labor in factories or for abusive strangers, instead showing substantial variation across and within countries but with most children’s work taking place in the context of their families. We use samples from Round 6 of the MICS between 2017 and 2020, taking advantage of a soon-to-be-released data resource. The IPUMS-MICS data collection is a harmonization of the UNICEF MICS data that facilitates comparison across countries and time (see 2022 award for IPUMS data: https://www.macfound.org/fellows/class-of-2022/steven-ruggles#searchresults ). Since the early 2000s, the MICS surveys have expanded greatly, including more detailed information on adolescents and covering up to age 17 (for most samples). This allows us to build upon the work of Edmonds and Pavcnik by exploring recent patterns for the same measures and ages considered in the earlier study and expanding to include children ages 15-17, to summarize more detailed characteristics of children’s work, and to also examine patterns by level of economic development and household relationship status. The Round 6 MICS include 41 samples from 36 low- and middle-income countries, with 332,681 children sampled in total. The child labor module is typically asked of one adolescent in each sampled household. Five MICS variables determine whether a child engaged in labor force work, allowing distinction between working for family/self versus working for others, and also measuring total labor force hours. We will calculate aggregate labor force participation rates for family, non-family, and any work, as well as average labor force hours, overall and by covariates. Eight additional MICs variables address whether children’s labor force work exposed them to potential hazards, such as dangerous tools or extreme temperatures. We will derive aggregate frequencies of overall exposure and by type of hazard, tabulated by family versus non-family work and by covariates. Another 12 variables report on children’s engagement in unpaid household chores (e.g., fetching water, cooking). We will calculate aggregate participation rates for nine categories of chores and average total chore time, each tabulated by covariates and by labor force status. In deriving these aggregate statistics, like Edmonds and Pavcnik, we plan to use within-country weights from the MICS data that reflect survey design to render nationally representative statistics for each sample, and age-specific population-based weights from UNICEF vital statistics for aggregating across countries. In 2020, the ILO estimated that there were 160 million children in child labor worldwide. Reporting since then suggests substantial increases in child work as families who lost income in pandemic-related shutdowns scrambled to meet basic needs. MICS patterns will also indicate children’s pre-Covid-pandemic levels of family and non-family work, providing a baseline for understanding their involvement in family livelihoods – and with implications for schooling – at the cusp of the 2020s.

Child marriage and teenage pregnancies in Africa: A multivariate analysis

Format: CPS Abstract

Author: Prof. Sathiya Susuman Appunni

Co-Authors:

  • Sathiya Susuman Appunni

Six hundred fifty million women and girls alive today were married before their 18th birthday. That’s one of the startling figures in a UNFPA 2021 report about child marriage. Africa’s sub-Saharan region is home to nine of the ten countries with the highest rates of child marriage in the world. This study focused on selected African countries and used Demographic and Health Survey data. Multivariate analyses were applied to figure out the comes. The answers we found to why early marriage is so common in these countries were not always clear-cut. What’s more, there were many statistical variations across the four countries and contradictions, as was to be expected. Ingrained traditions and cultural practices typically entrench such early marriages. State or customary laws in 146 countries allow girls younger than 18 to marry with the consent of their parents or other authorities. In 53 countries, girls under 15 can marry with parental consent. Early marriage among boys is also widespread, though the numbers are far lower than for girls and young women. And girls and young women pay the highest costs for early marriage. Girls who marry before 18 are more likely to be subjected to domestic violence and less likely to continue schooling than their peers. They have worse economic and health outcomes, a burden they almost inevitably pass on to their children. There is an urgent need for governments in these countries to introduce programs that promote delaying the age at which girls first have sex and to equip adolescents with knowledge about responsible and safer sex. Policymakers should also work to encourage prolonged enrolment in schools for adolescent girls. And, crucially, laws that criminalize child marriages are needed – and must be enforced.

Climate change and its Impact on Agriculture and food security in GCC countries: A statistical Insight

Format: CPS Abstract

Author: Mrs Ibtihaj Alsiyabi

Co-Authors:

  • Ibtihaj Alsiyabi

Climate change is a global phenomenon with implications on agriculture production and food security. GCC countries are susceptible to climate change leading to hotter climates, less precipitation and higher sea levels. Impact of climate change add to the already existing challenges imposed by the geographical location. The result of which limited food production and dependence on imports. Though the countries are performing well in the area of availability due to their economic development, they are not self-sufficient, and face issues in terms of sustainability of natural resources. The twofold solution entails finding innovative agricultural technologies and more sustainable use of water.

Clustering for utility assessment of anonymized data

Format: CPS Abstract

Author: Dr Maria Eugenia Ferrao

Co-Authors:

  • Paulo Fazendeiro
  • Paula Prata

Following a previous study on clustering as auxiliary tool to identify groups of interest in anonymized data, this work illustrates the methods applied to large datasets. Several clustering scenarios are compared with the original solution. Differential privacy is applied for data anonymization, utility is quantified by standard metrics, and partitional and hierarchical clustering algorithms are the gold standard. Validity indices provide evidence on what extent the data structure is preserved.

Competence-based approach to Building statistical capabilities

Format: CPS Abstract

Author: Dr Rabei Wazzeh

Co-Authors:

  • Raba’a Baniyas
  • Rabei Wazzeh

The paper describes the process of developing a comprehensive competency framework, job descriptions and modular learning with intermittent testing to ensure learning effectiveness and professional competence.

Competing Effects of Scale, Scope and Complexity in the Production, Dissemination and Use of Official Statistics

Format: CPS Abstract

Author: Dr John Lamont Eltinge

Competing Effects of Scale, Scope and Complexity in the Production, Dissemination and Use of Official Statistics John L. Eltinge, United States Census Bureau John.L.Eltinge@census.gov Key Words: administrative record data; data quality; efficiency; granularity; network density and complexity; record linkage; sample surveys; trade-offs Abstract: National statistical organizations (NSOs) have encountered increasing needs to improve efficiency and data quality, while they also expand their suites of statistical information products and services. Efforts to meet these goals often center on integration of data from multiple data sources, e.g., surveys, administrative records, sensors, and web scraping; and on expansion of platforms through which statistical information is disseminated and used. The practical impact of these efforts can involve numerous trade-offs among competing effects of scale, scope, and complexity. These effects may in turn require re-examination of stakeholder priorities within the space of statistical information; features of prospective data sources; the architecture and adaptability of systems for ingest and management of data, and for production and dissemination of estimates; changing requirements for personnel in high-priority technical areas; the measurability and stability of related cost structures; and multi-dimensional criteria for data quality. This paper explores these issues, with emphasis on three points. (1) Operational definitions of “scale,” “scope” and “complexity” within the broad context of current and prospective systems for statistical information production, dissemination, and use. (2) Production: Application of those definitions to current practice and literature for sample surveys, administrative record systems, sensor networks, web scaping and record linkage. These applications lead to exploration of trade-offs among scale, scope and complexity effects among multiple dimensions of efficiency and data quality in production of estimates for large-scale population aggregates and finer-scale small domains. (3) Dissemination and use: Evaluation of scale, scope and complexity effects related to anticipated groups of data users; their patterns of data use; and related features of data dissemination systems. User groups include segments of stakeholders who access and use statistical tables, graphs and maps provided through public-domain websites; and more specialized researchers who may integrate and analyze microdata within restricted-access environments. These groups vary in their patterns of use of published results and related quality measures, and in the substantive context within which they interpret numerical results.

Competing Risks Joint Models for Spatio-Temporal Correlated Health Outcomes: Application to Estimate Infectious-Disease-Driven Mortality in Africa

Format: CPS Abstract

Author: Dr Amina Msengwa

Co-Authors:

  • Amina Msengwa
  • Susan Rumisha
  • Ewan Cameron

The development of effective statistical models to analyze the spatial context in multivariate outcomes (i.e., “co-regionalisation” or “co-kriging”) from design-weighted health survey data is still emergent.   This study aims to establish a Bayesian geostatistical modeling framework that: analyze multivariate correlated response data of varied metrics; examine the correlation; compare the competing risks across spatial clusters; and predict at unobserved locations. Starting from the theoretical construct of categorical (binary) to trivariate outcomes linked to survival data, different forms of joint models are defined and related to each other in a conceptual framework. The established framework is applied to data extracted from Tanzania demographic and health surveys and malaria indicator surveys on illnesses (malaria, diarrhoea, anaemia and respiratory infections) and survival (time-to-death) of children under five years to jointly estimate spatial and temporal patterns of infectious-disease-driven childhood mortality. The approach is extended to data from other African children. These models are implemented in R-INLA.

Composite Indicator of the Feasibility of Big Data Approach

Format: CPS Abstract

Author: Mr Budhi Fatanza Wiratama

Co-Authors:

  • Zasya Safitri

Currently, the existence of big data has been realized more widely. There have been many studies using big data to approach the measurement of various standard statistical indicators, to cut costs and time in the data collection process. However, the use of big data itself depends on its availability which is still unequal between regions. Here we propose the formation of a composite indicator that can illustrate the feasibility of a big data approach at regional level in Indonesia. This will identify areas that are not yet worthy of being the locus of big data-based research, to avoid inaccurate and less representative results.

Constructing a Hedonic Rental Price Index for Türkiye

Format: CPS Abstract

Author: Ms Ozgul Atilgan Ayanoglu

Co-Authors:

  • Ozgul Atilgan Ayanoglu
  • Ezgi Deryol
  • Erdi Kizilkaya

Rent index, real estate market

Constructing a course on classification methods for undergraduate non-STEM students: Striving to reach knowledge discovery

Format: CPS Abstract

Author: Dr Yelena Stukalin

Co-Authors:

  • Anna Khalemsky

Classification is one of the most common data analytics tasks. It is employed in myriad disciplines including marketing, finance, sociology, education, and public health. It is therefore appropriate to extend familiarity with data classification methods to non-STEM students who will face such problems during their professional careers. For this end, the current work presents a full data classification course, which integrates theoretical and practical skills, and designed to prepare non-STEM students for comprehensive data analysis tasks. The level of difficulty of the course depends not only on the background of the students but also on the course prerequisites and requirements as set by the specific department. The suggested framework begins with data preparation, provides a comprehensive toolbox, including methodical techniques and software tools, for data classification, and eventually leads students to the discovery of new knowledge and insights. We recommend addressing the teaching of the subject as a dynamic process that involves grasping the analytical task, understanding the terms and concepts, visualizing the classification, analyzing the data, interpreting the results, and drawing conclusions. The course combines theoretical study, practical projects, open discussions, and even competitions between class participants. We assume that the practical projects, carried out in small groups, will have a significant impact, so we recommend focusing them on real problems based on real data. Our research has an important contribution in providing non-STEM students with the ability to perform an analytical process from problem characterization through data analysis to decision-making. Our study will make the use of a variety of data-mining methods accessible as a substitute or as a support for classical statistics education.

Construction and Validation of a Social Stratification Scale: Cambridge Social Interaction and Stratification (CAMSIS) Scale for China in the 21st Cen

Format: CPS Abstract

Author: Mrs Sun Xu

Co-Authors:

  • Sun Wangshu
  • Shao Qi

this article we use the RCII row-column correlation model to fit the occupational contingency table form Chinese General Social Survey(CGSS) in 2010 to 2017, and constructe CAMSIS scales for males and females, and measure the social interaction distance among occupations as well as the social hierarchies of China in the 21st century. Besides, by analyzing the correlation between CAMSIS and 4 typical variables of education, income, prestige, and self-identified class, the validity of CAMSIS scales are verified. This article describes characteristics and changes of occupational hierarchies in recent years: cultural capital increases, functional intellectuals rank higher, of which senior professionals rank even higher than senior managers; agricultural workers are interactively isolated from and rank significantly lower than other workers. This article also points out that gender is an important factor affecting occupational hierarchies.

Copulla Modeling to Analyse Financial Data Case Studies Indonesia Composite Index to Asiana Stock Index

Format: CPS Abstract

Author: Muchtar Abdul Kholiq

Co-Authors:

  • Titi Kanti Lestari

Copula modelling is a famous device in analysing the dependencies amongst variables. Copula modelling permits the studies of tail dependencies, that is of specific interest in risk and survival packages. Copula modelling is also of specific interest to economic and monetary modelling as it may assist inside the prediction of monetary contagion and durations of “growth” or “bust”. Bivariate copula modelling has a rich style of copulas that can be chosen to symbolize the modelled dataset dependencies and feasible intense activities which could lie inside the dataset tails. Financial copula modelling has a bent to diverge as this richness of copula sorts within the literature won't be well realised with the two particular styles of modelling, one being non-time-collection and the opportunity being time-collection, being undertaken in every other way. This paper investigates substantial copula modelling and monetary copula modelling and shows why the modelling techniques inside the use of time-series and non-time-series copula modelling is undertaken the use of different strategies. This difference, other than the problems surrounding the time-collection problem, is in most cases due to widespread copula modelling having the functionality to use empirical CDFs for the possibility critical transformation. Financial time-collection copula modelling makes use of pseudo-CDFs due to the standardized time-collection residuals being focused spherical 0. The standardized residuals inhibit the estimation of the possible distributions required for constructing the copula version inside the traditional manner

Creating statistically-defensible calibrated weights for a blended sample and then measuring the standard errors of the resulting estimates

Format: CPS Abstract

Author: Dr Phillip Kott

Calibration weighting is employed to combine probability and nonprobability samples in a statistically-defensible manner when the population from which the nonprobability sample has been drawns drawn is a subset of the full population. To this end, the probability of a population element self-selecting into the nonprobability sample is modeled as a bounded logistic function of variables collected on the nonprobability sample with known or probability-sample-estimated population totals.

Cross-border Food Remittances during the COVID-19 Pandemic: How Mobile and Digital Technologies transform Pathways from South Africa to Zimbabwe

Format: CPS Abstract

Author: Sean Sithole

Author: Sean Sithole. In the global South, digital remittances and mobile transfer services provide accessible, speedy and reasonably priced channels for international remittances. New evolving evidence in Southern Africa reveals that mobile and digital technologies facilitate the transfer of cross-border food remittances. As the focus on how big data can change economic and financial systems as we once knew them gains momentum. However, a growing body of research and literature on mobile transfers and digital remittances has directed primary attention to mobile money and cash remittances (Guermond, 2022; Gukurume and Mahiya, 2020; Ilinitchi, 2020; Kitimbo, 2021; Mutsonziwa and Maposa, 2016; Nyanhete, 2017; Siegel and Fransen, 2013; Tembo and Okoro, 2021). The transmission of non-cash remittances, especially food remittances, has been under-studied, yet in the global South, they play an important role in household food and nutrition security. This study addresses the research gaps on how big data partnerships are driving economic growth and social progression at household levels by providing insights into the significance of mobile and digital technologies in transforming the transfer of South-South cross-border food remittances. The study is based on the results of a questionnaire survey in the case of Zimbabwean migrants residing in Cape Town, South Africa. A key finding in the study is the surfacing and growth of new digital and mobile technology patterns that enable the transfer of food remittances, especially their expansion in 2020 when the COVID-19 pandemic, border closures, lockdowns and mobility controls limited the use of informal channels. Mobile and digital technologies to transfer food remittances have the developmental potential through financial inclusion of the unbanked, undocumented migrants and informal traders and providing convenient, swift and low-cost channels. The significance of this study is realised through its provision of further insight into mobile/digital global remittance markets that are contributing to significant shifts in patterns of social security in the southern African region. Keywords: Food remittances, COVID-19, mobile technology, digital technology, food security, Zimbabwean migrants

DEMYSTIFYING MATHEMATICS AND STATISTICAL LITERACY SKILLS AS INDISPENSABLE EDUCATION AND DEVELOPMENT TOOLS IN SIERRA LEONE.

Format: CPS Abstract

Author: Mr Joseph Saidu Sesay

Co-Authors:

  • Simon Kojo, Appiah(Professor), Fatmata Binta Sheriff

The numerical abilities required for success in mathematics and statistics courses are lacking among new students at Sierra Leonean colleges and universities. The employability of individuals, college/university retention rates, and students' motivation are all negatively impacted by low numeracy and statistics skills. (Rylands &amp; Shearman, 2015) argued that the universities had to redesign courses by reducing the course contents or funding new support services like mathematics learning centers to account for the greater proportion of secondary students who were performing worse in mathematics.  According to (Fan et al., 2005; Needs, 2011), employers are looking for graduates who are self-assured and adaptable because they can use mathematics and statistics in a variety of new job opportunities. Graduates of business schools worldwide must take quantitative courses in their academic curricula(Yousef, 2013) claimed. All children have a natural curiosity about the world and a zest for life, but as people age, these qualities don't always mature. One objective of considerate statistics educators and mentors is to foster children's enthusiasm and interests. Throughout their time in high school and college, there must be constant work to maintain students' interest in statistics and to increase their engagement in the subject which has high intellectual demands. However, information literacy, data literacy, and statistical literacy all converge when using data as evidence in a discussion (Schield,2004). Each requires research and evaluation. For all matters of education and development, data are the cornerstones of accountability when it comes to making wise choices and policies. It is nearly impossible to design, monitor, and evaluate effective policies without high-quality data, a strong grasp of mathematics and statistics, and the ability to deliver the right information at the right time. In Sierra Leone, statistics and research methods are required courses, but many students entered colleges and universities without having a genuine background in these courses. (Watson &amp;Callingham, 2020) for example, argued that statistical literacy affects a wide range of people, from undergraduates to adults, teachers to younger kids and that it is a social necessity. Furthermore, some people believe that having these talents in these days equates to doing our civic responsibility(Batur &amp;Baki,2022). (Ridgway et al., 2011) claim that the involvement of social scientists in statistics and the involvement of mathematicians in a discussion about social issues, as well as public debate based on evidence, should result in higher levels of statistical literacy throughout the entire population. This will aid in also reducing the fear that many people in Sierra Leone currently associate with numbers. The issue is that, as a result of the current digital revolution, statistical data must be absorbed and interpreted in a variety of formats(Bargagliotti et al.,2020; Fost,2013). The primary, junior, senior secondary, and tertiary educational institutions as well as researchers, place a greater emphasis on cognitive education practices, which restricted those with cognitive deficits and learning disabilities from accessing educational opportunities and making a positive contribution to society's development. This needs to be classified as a national emergency in order to ensure economic prosperity in Sierra Leone.

DEVELOPMENT OF INTELLIGENT PRIORITIZATION OF ACCOUNT FRAMEWORK FOR AUDIT PROCESSING OF FOREIGN EXCHANGE RECORDS

Format: CPS Abstract

Author: Mrs BERNICE VYTIACO

Co-Authors:

  • BERNICE VYTIACO
  • Charles R. Morales
  • Anton M. Callangan
  • Prof. Francisco De los Reyes

Finding irregularities or detecting “not-normal” instances or sometimes called anomaly detection or outlier detection is the main process of audit processing (Liu, 2019). In this project, a potential anomalous record is defined as a record that deviates from the “normal” behavior of all the records of a bank; a data point that is inconsistent with either the item or customer historical behavior. These records could be a possible indication of errors in the report. This project proposes a framework that will serve as a guide in identifying which audit items are potentially anomalous and need to be prioritized. The proposed methodology seeks to augment the existing audit process and reduce processing time in auditing monthly foreign exchange records and not necessarily replace the current audit process.

DISTRIBUTION OF HEALTHCARE FACILITIES FOR MONITORING MATERNAL MORTALITY IN ACHIEVING SUSTAINABLE DEVELOPMENT GOALS IN NIGERIA

Format: CPS Abstract

Author: Dr Kehinde Kazeem Adesanya

Co-Authors:

  • Osuolale Peter Popoola
  • Salako Sikiru Gbolahan

Public Health statistics is the process of collection, analysis and dissemination of health related data to provide adequate information to improve the health living condition of the populace. Health monitoring is a regular collection of data on relevant components of health and its determinants. This study determined the statistical distribution of healthcare facilities as tool for monitoring maternal mortality in achieving sustainable development goals in Nigeria. The usage of timely data in public health corroborate the aim of Sustainable Development Goals declaration that emphasis to achieve the overall health goal, universal health coverage and access to quality health care is paramount. Database were obtained and reviewed majorly from relevant literatures. It further revealed the coverage estimates for maternal health indicators, flaws and solutions to Nigeria healthcare system particularly the Primary Health Care and ascertained that medical surveillance and monitoring represent a useful component in the healthcare system in achieving Sustainable Development Goals III. Therefore, building a system well-grounded in routine monitoring and medical intelligence as the backbone of the health sector is necessary and serve as pathway to reducing maternal mortality in Nigeria. Keywords: Maternal mortality, Statistical distribution, Healthcare facilities, Monitoring and Sustainable Development Goals

Data Quality Framework - aiming at common quality descriptions for the public sector data

Format: CPS Abstract

Author: Mrs Essi Kaukonen

Co-Authors:

  • Essi Kaukonen
  • Outi Ahti-Miettinen

The amount of data is constantly growing in the world and so is the potential to use diverse information for knowledge-based management. Knowledge-based management is often defined as decision-making based on up-to-date and quality data. Less often, however, we describe what that quality data means and how it is verified. Every professional data user needs information about the quality of the data set in order to rely on the data they use. The description of the quality of the data allows the user to assess the suitability of the data for their own use. Only with high-quality data we can support responsible data use and decision-making. This might be even a more important issue if we develop automatic decision-making processes. To explicitly describe the quality of data in Finland the National Data Quality Framework was created together with ten public organizations led by Statistics Finland. The data quality criteria and the metrics together with models and tools supporting their implementation and management create the National Data Quality Framework. The criteria and metrics for structured data were launched as a government recommendation. As an important result of the extensive cooperation of the project, a common language has been formed for discussing and developing the quality of data among the public organizations nationwide. The aim is to increase the usability and uniformity of data and extend the use of data for decision-making in society and by enterprises. The purpose is to produce a uniform manner to describe the quality of data in public administration. With the help of the model developed government agencies will together produce more easily utilized and higher quality data for public data resources. Statistics Finland took an active role starting this cooperation on the quality of the common data reserves throughout the Finnish administration. Based on the proposal of Statistics Finland The Data Quality Framework -project was originally included in the program on Opening Up and Using Public Data coordinated by the Ministry of Finance for the years 2020-2022. The program targeted in the wider and more effective use of public data throughout society and, more widely, in implementing the relatively new horizontal policy sector i.e. national information policy which plays an important role in public sector digitalization. During 2022 a cooperation network for enhancing public data quality and supporting the use of the newly created quality criteria was piloted and this work carries on. Statistics Finland is leading this cooperation wherein altogether almost a hundred participants from various organizations work for the same target; to make the quality of the public sector data visible and measurable and thus, enabling both the better use of this data and making the data better. The paper explains the possible role of the statistical agency from the perspective of the quality of data for the entire public administration. The paper also presents selected quality criteria grouped into three groups to answer three important questions: How does the data describe reality? How is the data described? and How can the data be used?

Data Science, Statistics and Agricultural Development: Evidence from Agricultural Data metrics and Budgetary Appropriation for Rural Development in Ni

Format: CPS Abstract

Author: Prof. Temidayo Apata

Co-Authors:

  • Apata, Temidayo Gabriel
  • Ayantoye Kayode
  • Ojo Olutope Stephen
  • Ajiboye John Akinyele
  • Oloniyo, Roseline Boluwaji

Statistics is: the fun of finding patterns in data; the pleasure of making discoveries; the import of deep philosophical questions

Data science for informed citizen: Learning at the intersection of data literacy, statistics and social justice

Format: CPS Abstract

Author: Prof. Joachim Engel

Data science has been conceived to address tangible problems. Educating students in data science goes beyond teaching about algorithms, skills of manipulating data sets, selecting and applying appropriate analyses, and creating visual representations of data. It involves raising a critical understanding of how data are produced and how they can be used for particular purposes, including the role of context in interpreting data. It emphasizes developing an awareness for data ethics, and considering the implications for policy and society when powerful algorithms are used.

Data visualization and reformulating Egyptian International Household Migration Survey’s results.

Format: CPS Abstract

Author: Mr Ali Hepishy Kamel Abdelhamid

Co-Authors:

  • Rawia Wagih Ragab

Data visualization and reformulating Egyptian International Household Migration Survey’s results. Data visualization is the practice of translating information into a visual context. The importance of Data visualization comes for many reasons; It provides a quick and effective way to communicate information in a universal manner using visual information, makes data more memorable for users, the ability to absorb information quickly, improves insights, makes faster decisions, and many more. This research paper is tending to reformulate the results of the Egyptian International Household Migration Survey (EHIMS) using interactive dashboard data visualization instead of using traditional graphs currently used in official publications to emphasize the differences and power of interactive dashboards in displaying data insights more effectively and send clearly messages to data users, decision-makers and public opinion.

Data-driven and user-centric insights function

Format: CPS Abstract

Author: Badria Abdulla

Co-Authors:

  • Abdulla Al Dhaheri

Insight and Foresights Platform (IFP) The Insight and Foresights Platform (IFP) helps in deliver high quality insights analysis in collaboration with core statistics teams and several partners in the field through a state-of-the-art platform. The goal of the IFP is to “Become a leader in the application of Big Data and Data Science to amplify data, enrich statistics, and generate insights for the betterment of the Emirate of Abu Dhabi, ultimately catering to several key segments of the society”.

Death Process Approach for Modelling Changes in Marriage Probabilities

Format: CPS Abstract

Author: Mrs Neela Gulanikar

Co-Authors:

  • Sangita Kulathinal
  • Akanshka Kashikar

We model the NFHS-IV data on marriages as a continuous-time death process observed over discrete equidistant time points, where marriage is considered as an exit from the marriageable population and hence is considered as a death in the death process terminology. A death process with a change point has been proposed as a model to assess the changes in marriage rates over time. We derive the maximum likelihood estimators for the parameters and discuss their asymptotic distribution. The asymptotic distribution of the estimators is then used to propose a test for examining the presence/absence of a change point. Parametric bootstrap has been proposed as an alternative approach to derive the rejection region for the test statistic. For the estimation of the change point we have also used expectation and maximization (EM) method to provide confidence interval for change point using parametric bootstrap. In case of two change points, we consider that initially marriage rate is linearly increasing, then it is constant for some time points and then it is linearly decreasing. For the model with two change points, constrained numerical maximization of the likelihood has been carried out for the estimation of the parameters. The linear regression method is used for the estimation of initial values of the parameters. An extensive simulation study is carried out to examine the performance of the proposed procedures. NFHS-IV data is modeled via these approaches to study the changes in male and female marriage rates over time.

Debunking misleading graphs – what method works best?

Format: CPS Abstract

Author: Dr Sanne J.W. Willems

Co-Authors:

  • Dr. W. Wijnker
  • Prof.dr. I. Smeets
  • Dr. P. Burger

The increasing use of graphs on social media enables quick understanding of complex information. But it also facilitates the spread of misinformation when graphs are designed to be misleading. How can we debunk these misleading graphs? In our two-survey experimental study, we investigated and compared the effectiveness of four correction methods as debunking strategies to correct bar charts with manipulated vertical axes. Additionally, we investigated whether the correction effects last for at least a week and explore whether there are any differences between people with various levels of graph literacy and education. In this presentation we will show the set-up, results, and conclusions of this comparison of debunking strategies. This study is part of a larger research project aimed at providing guidelines for factcheckers, science communicators, and (data) journalists on how to effectively combat misleading graphs.

Degree of Urbanisation Implementation in Indonesia

Format: CPS Abstract

Author: Mr Achmad Firmansyah

Co-Authors:

  • Wida Widiastuti

Global methodology for delineating urban-rural areas is needed to monitor the Sustainable Development Goals indicator. The degree of urbanisation (DEGURBA), a proposed method by the United Nations, can classify cluster areas into three-level categories; urban centres, urban clusters, and rural. Those classifications are produced by considering three main variables, i.e. population density, the proportion of settlement areas, and neighbourhood contiguity. In addition, several sequential procedures are conducted to produce the categories, i.e. creating a population grid, developing a grid of DEGURBA, and applying DEGURBA into small spatial units. Here, the DEGURBA is implemented in Indonesia with several modifications adapting to the island topology in Indonesia. Moreover, the recent result of the 2020 Population Census and the Global Human Settlement Layer, which contain the percentage of settlement areas, are used to emphasise the results. The results of DEGURBA implementations show several inequalities in Indonesia in the form of 1kmx1km grid areas. First, regarding population density, the Java Islands are the densest island in Indonesia, with more than 70% composing the grid having more than 500 people. The results are significantly imbalanced compared to other islands, such as Papua. Second, based on the grid of DEGURBA, the composition of three DEGURBA categories are quite varied on each island. The urban centres dominate in Java Island; the urban cluster dominates in Sumatera, Bali-Nusra, and Sulawesi island; and the rural dominate in Kalimantan, Maluku, and Papua. The domination of urban clusters in the three islands represents the shifting urbanisation process. Lastly, for the small spatial units, most spatial units in Indonesia are classified as rural areas, except Java Islands, which are dominated by cities.

Demographic and socioeconomic profile of severely ill and disabled children in South Africa

Format: CPS Abstract

Author: Miss Nwabisa Mona

Co-Authors:

  • Nwabisa Mona

Persons with disabilities are often neglected in the social welfare systems, particularly in accessing services. There is a lack of inclusivity and disability mainstreaming of children with disabilities and severe illnesses resulting in unmet needs and widened barriers and disparities which leave this population behind. Furthermore, the challenge of under-reporting and the lack of extensive data on children with disabilities and severe illnesses is a growing concern in South Africa. This may also cause concern for the lack of availability of disability mainstreaming, particularly in the social welfare systems such as the education and health sectors. Often this subset population is left behind and whose needs remain unmet. This study aims to describe the relationship between socioeconomic and demographic factors as one of the social determinants of health in children with disabilities and severe illnesses compared to children with no form of disability. To describe the relationship between access to social welfare services such as education, health, and social assistance as a form of community-based rehabilitation by comparing children with no disability and those with severe illnesses and disabilities. Significance of the study: This study addresses the unmet need for access to social welfare services and the disparities encountered by children with severe illnesses and disabilities compared to children with no form of severe illness or disability. This study describes the role of demographic and socioeconomic indicators as social determinants of health of the epidemics, and it describes the accessibility to social welfare services such as health, education, and social assistance, in the comparison between children with severe illnesses or disabilities compared to children with no form of disability or severe illness.

Dependence in the survival of ancestral genome

Format: CPS Abstract

Author: Prof. Elizabeth Thompson

Study of the descent of genome in defined pedigrees underlies many genetic analyses, including the survival of founder DNA in the complex pedigrees of managed endangered species. It has long been known that, across a chromosome, descent of genome through the m meioses of a defined pedigree may be represented as a random walk on the vertices of an m-dimensional hypercube. At any single genome location, survival of a specified founder genome must decrease the probability of survival of others, the highest negative correlations in survival being between genomes in a single diploid founder, and next within a founder couple. Across a chromosome the reverse is true. The survival of an ancestral DNA segment from a founder greatly increases the probability of survival of a segment from an adjacent founder genome, where adjacency is in terms of the vertices of the hypercube. Results have practical application in studying the diversity of founder genomes present in key current individuals (for example, in a clone), in studying the survival of introgressed genomes, and the effect of both breeding choices and natural selection for or against such genomes on the survival of other founder genomes.

Design and analysis of cluster randomized controlled trials to evaluate the effectiveness and safety of digital health interventions

Format: CPS Abstract

Author: Prof. Ling Li

Co-Authors:

  • Tim Badgery-Parker
  • Johanna Westbrook

Digital health interventions (DHIs) have been increasingly implemented in hospitals and general practices worldwide. Cluster randomized controlled trials (CRCTs) and stepped wedge (SW) CRCTs are used to evaluate DHIs when it is inappropriate or impossible to use individual randomization. The objectives of this paper are to describe the design and analysis of CRCTs and SWCRCTs for evaluating the effectiveness of DHIs, and to discuss the practical and methodological challenges of these trials.

Designing indicators for monitoring the Food Sovereignty

Format: CPS Abstract

Author: DRS Amal Mansouri

Food sovereignty is a major concern, especially with the recent development of global food crises. Despite increasing relevance in recent years, several aspects remain under-explored. The concept itself is unclear due to the complexity of its meaning and the diversity of visions it can incorporate. There is a need to review the definition and develop a set of carefully selected indicators to describe food sovereignty in all its dimensions and enable the design of appropriate responses based on reliable information. Our analysis aims to understand how relevant and discretionary the choice of indicators can be to inform and monitor the development of food sovereignty in Africa. We will primarily use data published over the past decade by the Food and Agriculture Organization of the United Nations and the World Bank to identify indicators that provide information on the six dimensions of food sovereignty. We will dig deeper into the relationship between the different indicators and we will analyze the main associations and discrepancies at the level of a sample of 16 African countries, chosen mainly because of the availability of data.

Detecting changes in covariance using random matrix theory

Format: CPS Abstract

Author: Rebecca Killick

One approach to modelling changing behaviour in data is to assume that the changes occur at a small number of discrete time points known as changepoints. Between changes, the data can be modelled as a set of stationary segments that satisfy standard modelling assumptions.  Changepoint methods are relevant in a wide range of applications including genetics, network traffic analysis and oceanography. We consider the specific case where the covariance structure of the data changes at each changepoint.  While our focus here is on a specific application, the problem has wide applicability.  For example, Stoehr et al. (2019) examine changes in the covariance structure of functional Magnetic Resonance Imaging (fMRI) data, where a failure to satisfy stationarity assumptions can significantly contaminate any analysis, while  Wied et al. (2013) and Berens et al. (2015) examine how changes in the covariance of financial data can be used to improve stock portfolio optimisation.

Determinants of subsequent contraceptive use among adolescents with prior unmet need in the Rakai Community Cohort Study (RCCS) population in Uganda

Format: CPS Abstract

Author: Ms Lillian Ayebale

Co-Authors:

  • Allen Kabagenyi
  • Stephen Wandera Odhiambo
  • Tom Lutalo

Studies acknowledge that, globally interventions aimed at reducing the unacceptably high levels of adolescent pregnancy need to be strengthened (World Health Organization (WHO) 2017). Although this affects all adolescents, adolescents in Africa are more disproportionately affected by adolescent pregnancy than adolescents in other regions and it is becoming increasingly important to explore strategies that can be employed to decrease adolescent pregnancy rates, particularly with regards to their application within the African cultural context (Chandra-Mouli, Camacho and Michaud, 2013). Existing literature shows that adolescent pregnancy in Uganda occurs in a multifaceted and difficult environment (Renzaho et al., 2017) and yet the existing laws do not comprehensively lead to the curbing of teenage pregnancy, nor do they contribute to the complete mitigation of problems faced by the pregnant teenagers. With a predominantly young population where 47.3% are less than 15 years many adolescents are increasingly becoming emancipated minors and will continuously be affected by similar conditions. One in every four Ugandans (23.3%) is an adolescent, of between 10and 19 years; and one in every three (33.5%) is a young person aged between 19and 24 years.

Determinants of survey participation in the National Income Dynamics Study - Coronavirus Rapid Mobile Survey (NIDS-CRAM) Waves 1-5

Format: CPS Abstract

Author: Dr Reza C. Daniels

This paper will conduct an analysis of survey participation over the duration of NIDS-CRAM, which was a five-wave longitudinal telephonic personal interview survey implemented in South Africa during 2020-2021, designed to track the socioeconomic impacts of COVID-19 and associated government lockdowns on a nationally representative sample of adults. Across the five waves of the survey, it was evident that there were periods of both attrition between waves, as well as negative attrition between waves. There was also the introduction of a top-up sample in Wave 3, as well as a retention strategy in Waves 4-5. We will attempt to statistically identifying the impact of the retention strategy implemented in Waves 4-5, and compare it to a counter-factual of what survey participation could have looked like had no retention strategy been implemented. We will also investigate how attrition on observable characteristics of respondents affects the estimation samples of researchers using the data, so that the research community is aware of the overall effect of survey participation and attrition in the NIDS-CRAM survey.

Determination of Implicit Stratification Variables for the Development of 2023 Geo-enabled Mastersample for Household-based surveys in the Philippines

Format: CPS Abstract

Author: Ms Sherylen Naive-Piquero

The main goal of this study is to re - examine if the implicit stratification variables used in the 2013 Master Sample are still adequate for the 2023 Geo-enabled Master Sample, and if not, to develop more up-to-date, efficient, and cost-effective implicit stratification variables to produce reliable household-based survey estimates in the Philippines. Eight (8) implicit variables that are tested in thirty – nine (39) combinations or two hundred thirty – four (234) permutations of implicit stratification variables were simulated to determine which set of implicit stratification variables produced the lowest coefficient of variation (CV), standard error (SE), design effect (deff), and relative bias (RB). Comparing the sets of implicit stratification variables used in the 2013 Master Sample with the sets of implicit stratification variables with the lowest average CV, average SE, average deff, and average RB across the ten variables of interest, the set of implicit stratification variables with the most instances of having the best results is agricultural score plus wealthproxyB plus geolocation when using the matched 2018 LFS-FIES and 2013 Master Sample. But with the most recent 2021 LFS-FIES, further investigation is conducted to validate the initial results obtained.

Developing a comparison metric for survey question compliance: An case in utilizing open-source software and methodologies within the US Census Bureau

Format: CPS Abstract

Author: Dr Sheldon Waugh

Co-Authors:

  • Sheldon Waugh
  • Scott Glendye
  • Rafael Puello

The U.S. Census Bureau provides an extensive survey paradata collection, including the ability to record questions and interactions between the field representative/interviewers (FRs) and the survey respondent. Computer Audio-Recorded Interview (CARI) is a technique for recording portions of interviews. The CARI Interactive Data Access (CIDA) System provides a way of listening to these recordings and viewing these files. The Survey of Income and Program Participation (SIPP) is currently the only survey within CIDA with respondents' informed consent. Using Machine Learning and data engineering and development, open-source algorithms and novel methodologies can enable audio transcription, converting recorded questions and answers into actionable data. Providing an in-house technical solution within the federal government, without external contracts, maximizes government investment returns into Machine Learning Operations (MLOps).

Differentially Private Sampling with Replacement

Format: CPS Abstract

Author: Prof. nobuaki hoshino

Subsampling for disclosure control is justified by differential privacy under a traditional finite population framework.

Dimensionality reduction using the ordered label for trajectory inference

Format: CPS Abstract

Author: Mr Masaaki Okabe

Co-Authors:

  • Hiroshi Yadohisa

We propose a dimensionality reduction method that assumes that labels with an ordering structure are given as external information. We assume a situation in which information from which samples were obtained is given as ordered labels. The noise may be added to the labels. In this situation, the label does not necessarily have a relationship with the true trajectory. Our method solves this problem by performing supervised dimensionality reduction robust to label perturbations.

Distributional Method for Risk Averse Reinforcement Learning

Format: CPS Abstract

Author: Dr Ziteng Cheng

Co-Authors:

  • Ziteng Cheng
  • Sebastian Jaimungal
  • Nick Martin

I am a postdoctoral fellow in the Department of Statistical Sciences, University of Toronto, mentored by Dr. Sebastian Jaimungal. I got my Ph.D. in Applied Mathematics from Illinois Institute of Technology under the supervision of Dr. Tomasz R. Bielecki and Dr. Ruoting Gong. I have broad research interests in stochastic processes and related statistics. My thesis regards first exit problem of time-inhomogeneous Markov modulated processes, and statistical inferences for SPDEs. I am currently working on integrating risk aversion into mean field games, reinforcement learning and inverse reinforcement learning.  I am also interested in transport theory for stochastic processes.

Distributional Wealth Accounts for the Euro Area

Format: CPS Abstract

Author: Mr Henning Ahnert

Co-Authors:

  • Pierre Sola
  • Nina Blatnik
  • Marco Felici

Significant progress has been achieved in recent years by the European System of Central Banks on the compilation of Distributional Wealth Accounts. This dataset combines micro data from a household survey (Household Finance and Consumption Survey (HFCS)) and macroeconomic sector accounts to show quarterly distributional results on the wealth of households, for the euro area as a whole and individual European countries. The paper will briefly describe the methodology followed to link the two sources, and the adjustment made through this process. It will clarify how quarterly time series are derived, their information content, and the approaches under way to further improve them. The impact of asset price changes on the wealth distribution for the euro area will be presented. Finally, the link between this new dataset and the new G-20 Data Gaps recommendation on wealth distribution will be outlined.

Do We Have Signals? Revealing Substantial Cohort Change in Mortality Modelling

Format: CPS Abstract

Author: Mr Suryo Adi Rakhmawan

Co-Authors:

  • Nasir Abbas
  • Mohammad Hafidz Omar
  • Muhammad Riaz

Mortality modelling is a practical method for the government. Through this modelling, we can get a picture of mortality up to age-specific for a particular year. However, some information on the phenomenon may remain in the residuals vector and unrevealed from the models. We handle this issue by employing a multivariate control chart to discover substantial cohort changes in mortality behaviour that the models still need to collect.

Does energy access affect poor household food security in Indonesia? An instrumental variable approach

Format: CPS Abstract

Author: Ms Ratna Rizki Amalia

Co-Authors:

  • Rus'an Nasrudin

Given the limited research on the relationship between access to clean energy and food security, this study investigates the impact of clean energy access on poor household food security. We use the household data in Indonesia from 2018-2020. The study uses instrumental variable methods to overcome the endogeneity problem of access to clean energy by instrumenting access to clean energy with historical distance variables (old postal highway and old port in 1934). The results show that the program has led to a significant improvement in food security. Poor households with clean energy access have food security 14.46% higher than the poor household without clean energy access. Compared with rural households, access to clean energy for poor households in urban areas has a more significant impact on the household food security level. This finding implies that the government must continue to expand the availability of access to cleaning energy for the poor to promote and expand the use of clean energy for the poor and increase the food security of low-income families.

Does family background affect early childhood education participation decision? Evidence from Indonesia

Format: CPS Abstract

Author: Annisa Nur Fadhilah

Co-Authors:

  • Lita Jowanti

The early childhood phase is crucial for developing children's skills and intelligence, which will significantly influence their future lives as adults. The percentage of children (0-6 years) who participate in Early Childhood Education (ECE) in Indonesia only reaches 27.68%. Family is the closest environment and the first place for children in social interaction, so the family becomes a crucial factor for children’s development. This study aims to discuss the role of family socioeconomic and demographic factors on ECE enrolment status in Indonesia using the National Social and Economic Survey of March 2020 with Binary Logistic Regression and analysis descriptive. The result shows that the probability of children ages 0-10 years participating in early childhood education is higher for those who live in an urban area, come from households that have higher income, have a working household head, and a small number (less than 4) of family members. The early childhood development index shows a higher score for children with better family economic conditions, children who live in an urban area, and higher parental education. Government should enhance early childhood education facilities both in urban and rural areas. Government should strengthen regulations regarding the obligation to attend early childhood education.

Does the Presence of Downstream and Upstream Foreign Direct Investments Affect the Labor Productivity of Domestic Industries?

Format: CPS Abstract

Author: Ms Rose Ann Hernandez

Co-Authors:

  • Rose Ann Hernandez

With the premise that foreign direct investments (FDI) facilitate technology and knowledge transfer to domestic industries, eventually contributing to the country’s sustainable economic development, the Philippine Government further liberalized its Foreign Investment Act in March 2022 to attract more foreign investors. However, recent empirical evidence showing that FDI does facilitate transfer of technology and knowledge and benefit domestic industries remains limited in the Philippines. This study, based on a balanced panel of industry-level data of manufacturing firms in the Philippines from 2010 to 2017, examines the effect of downstream and upstream FDI presence on the labor productivity of the manufacturing industries in the country. Empirical results suggest that FDI presence in the downstream industries negatively affects the labor productivity of domestic suppliers, while FDI presence in the upstream industries does not significantly affect the labor productivity of domestic final-goods producers. To reap the positive productivity benefits from FDI, the findings of this study recommend the development of policies and programs to raise the absorptive capacities of domestic industries, upgrade the local quality standards of the domestic suppliers, and strengthen the collaboration between foreign suppliers in the local market and domestic final-goods producers.

Double Sampling Control Charts for Monitoring the Median of Birnbaum-Saunders Distribution

Format: CPS Abstract

Author: Dr Ming-Che Lu

Co-Authors:

  • Su-Fen Yang

The Birnbaum-Saunders (BS) distribution is a lifetime distribution describing fatigue failures and general random wear failures. By means of monitoring the median of BS distribution, the quality control of lifetime of products subject to failures can be performed. Although increasing sample size can improve the detecting power for median shifts and reduce false alarm rates, its disadvantages are not only the increased cost of sampling as well as measurement but also the need to spend more time on analysis and measurement of product's lifetime. Double sampling monitoring schemes may provide better statistical performance and reduce cost without increasing sample size. In this study, we construct a double sampling control chart for monitoring the median of BS distribution to detect if exists any shift of median. The average number of observations to signal (ANOS) is used to measure the out-of-control detection performance. The smaller ANOS under the same specified in-control average run length indicates better out-of-control detection performance. The proposed chart presents smaller out-of-control ANOS than the existing chart, meaning less cost and time spending on monitoring the median of BS distribution. Finally, a real data with BS distribution is given to illustrate the design and implementation of the proposed chart.

EFFECT OF HOUSEHOLD FOOD SECURITY ON HEALTH STATUS WITH A MEDIATING FACTOR: A BOOTSTRAPPING RECURSIVE SEM PERSPECTIVE

Format: CPS Abstract

Author: Abdul-Aziz Abdul-Rahaman

Sub-Saharan Africa is facing a huge challenge on how to feed the ever-increasing population. Food availability, therefore, is a problem for many people in Ghana and especially for the rural households in Upper West region. The study is aimed at using a mediation factor to determine the effect of household food security on health status of individuals through a recursive structural equation modelling (SEM) approach. The research design was primarily explanatory. The study selected twelve (12) farming communities through a cluster random sampling. Data was collected by administering a structured questionnaire to the respondents. The study applied the four main stages of model specification under Structural Equation Modelling (SEM) framework. The results showed that the indirect influence of food security on health status of respondents was high relative to the indirect effect of food security on health status. The mediation effect of demographic factors on health status was relatively weak. Moreover, there were disproportional effect on the latent factors from the measurement variables. In conclusion, household food security had influence on the health status of individuals, to some extent, with the role of the mediation factor.

Early pregnancy and motherhood among young women in Sub-Saharan African Countries: a multivariate analysis

Format: CPS Abstract

Author: Prof. Sathiya Susuman Appunni

Adolescent pregnancy and childbirth are severe medical and public health problems for developed and developing countries. Although legislative, institutional, and policy measures have been implemented, girls still face early pregnancies and childbearing in Sub-Saharan Africa. This study aimed to examine the trends and factors associated with early pregnancies and motherhood among young women in three sub-Saharan African countries, specifically Malawi, Mali, and Niger. Descriptive and multivariate analyses indicated a significant decline in the prevalence of early childbearing in all three countries between 2006 and 2016, along with variability in determinants of age at first childbearing across countries. However, the study found that respondents who married in their early and middle adolescence are 20.53, 10.27, and 6.19 times (in Malawi, Niger, and Mali, respectively) more at risk of early childbearing than those who married in their emerging adulthood. There is an urgent need to introduce programs that promote delaying the age of first sexual debut and equip adolescent women with knowledge about responsible and safer sex and motherhood. In addition, government authorities (policymakers) have to promote prolonged enrolment in schools for teenage girls and enforce a law that criminalizes child marriages.

Educational data science: monitoring learning tecnologies in primary schools

Format: CPS Abstract

Author: Dr Ignacio Alvarez-Castro

Co-Authors:

  • Ignacio Alvarez-Castro
  • Natalia da Silva

In recent years, particularly after the COVID-19 pandemic, the use of different learning management systems (LMS) for various objectives has become a key tool for education. A huge volume of student and teacher data are generated by LMS on a daily basis. Transforming these data into relevant information for decision-making and educational public policy is a major challenge due to the complexity of the data structure and the difficulty of summarizing the learning process with registered information. In this work, we combine several computational, statistical, and visualization tools to tackle this challenge with data from primary schools in Uruguay.

Effectiveness of Takahasi's Ranked Set Sampling

Format: CPS Abstract

Author: Prof. Arun Kumar Sinha

Co-Authors:

  • Vijay Kumar, Dept. of Statistics, TM Bhagalpur University, Bhagalpur- 812007, Bi

Ranked Set Sampling (RSS) has been introduced by McIntyre (1952) with the aim of estimating agricultural characteristics. It is a cost-effective sampling technique when the variable of interest is expensive or difficult to measure but it could be ranked easily at a negligible cost. Under equal allocation RSS performs better than simple random sampling (SRS). The performance of RSS further improves when appropriate unequal allocation is used instead of equal allocation. In Takahasi's RSS method, after selecting m2 (m squared) units randomly from an infinite population and arranged them into m sets with m units each, a unit is randomly selected from each set. Each so obtained unit is then quantified and a rank between 1 and m (both inclusive) is given to its quantification. Obviously, one may not get the same frequency for each rank order as in the case of McIntyre’s RSS method and also there is a possibility of zero frequency for a rank order even after selecting  r times m-squared units from the population.  To deal with these difficulties, Takahasi (1970) suggested the McIntyre’s method for collecting samples in one cycle. This ensures that every rank order gets at least one quantification. This method performs well when one is interested only to estimate the population mean. But this does not help while estimating the variance of the estimator because in this case the variance of each rank order is needed. In view of these facts Norris, Patil and Sinha (1995) suggested to use McIntyre’s method in two cycles while using TRSS and referred to it as modified TRSS (MTRSS). These procedures are illustrated using a real data set regarding the yield of potato. The technique is more useful to those who look for cost-effective technique for estimating agricultural products that are grown underground such as potato, ginger, turmeric, garlic, onion, beetroot, peanut, etc. These methods are quite useful when a population of interest may not be stratified, and this is a general scenario in many real-life scenarios.

Emerging Data Needs in Dealing with Uncertainty: DOSM Official Data Request

Format: CPS Abstract

Author: Mr FARRIL FARDAN DANIAL

Co-Authors:

  • Nur Ain Zainal Abidin
  • Nazirah Ibrahim
  • Arifah Abdul Malik
  • Nurul Atiqah Zainal Abidin

The demand for official data and statistics has increased substantially in recent years. Reliable and timely data and statistics are more important than ever before. Data are being used in many contexts. Having visible and active national statistics producers is key to helping ensure that the public receives information that is reliable and can be used for informed decision-making.

Estimating Labor Statistics in Pooled Monthly Surveys

Format: CPS Abstract

Author: Mr Nikkin Beronilla

Co-Authors:

  • Quindale Caraos

The Labor Force Survey (LFS) was originally conducted once every quarter to estimate labor statistics in the Philippines. Beginning 2021, however, the LFS is conducted monthly in response to the volatile nature of employment due to CoVID-19. To simplify the survey operations, it uses dependent replicates between regular (i.e., quarterly) and additional monthly surveys. However, the trade-off is that the computation of variance in pooled surveys is complicated as the regular-to-additional-month covariance is not equal to zero. As a result, additional steps are required to account for non-zero covariances. Hence, this paper aims to demonstrate a method to compute the variance of labor statistics that captures replicate dependence.

Estimating Relative Response Rates and Preferential Ranking of Subjects

Format: CPS Abstract

Author: Dr Chinwendu Uzuke

Co-Authors:

  • Ikewelugo Cyprian Anene Oyeka

In statistical analysis of one sample data, a lot of attention has often been devoted to measures of central tendency and measures of dispersion for these data sets, their estimation and hypothesis test concerning them. If these are the only interest, then one can use any one of the familiar statistical methods such as the one – sample ‘t’ test or sign test to analyze such data (Gibbons;1973, Oyeka;2009, Oyeka et al;2010). But one sample data sets intrinsically contain much more unexplored information than only a few parameters. One of such information is the relative magnitudes of the observations themselves. For example, often assessors, decision makers, judges, teachers etc may assess, examine or judge a sample from a population of subject and score them for employment, placement in educational institutions or for selection to fill vacant position when opportunities are limited. A medical or health researcher may have data or information on subjects or patients on their state of health, medical test results, level of concentration of some contaminants, disease load, injury level and other such conditions, to guide decisions on the distribution and use of amenities when supplies are limited.    The problem then before the decision makers is how using these observations to rationally select the required number of subjects from the group of available subjects to ensure that meritocracy is upheld in the presence of scarcity. Here, although, any desired hypothesis may be tested, this may not however be as important as the need to find appropriate ways to systematically rank-order the subjects according to their level of need or performance in a given test to facilitate judicious selection to achieve a desired objective. Interest may also be on how to as much as possible optimally break away ties between the subjects in their performance or response and hence selection using available information. This paper attempts to address these problems.

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Format: CPS Abstract

Author: Olivier Binette

Co-Authors:

  • Olivier Binette

This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.

Estimating the magnitude and pattern of catastrophic health expenditure in Egypt

Format: CPS Abstract

Author: Dr Sarah Ibrahim

Co-Authors:

  • Ahmed Samir Gomaa
  • Sarah Ibrahim

Maintaining the health of the population is one of the main challenges for countries all over the world. That can be exhibited in (Human Development Index) HDI because one of its main indicators is life expectancy at birth, a numerical method of a long and healthy life for populations. Consequently, continuous efforts are spent to develop new healthcare policies to improve the population's overall health. Unfortunately, there is no single approved and effective healthcare system for quality and cost-effectiveness (publicly-funded, privately funded, or a mixture). Nevertheless, most economists, politicians, and decision-makers agreed upon one goal; to minimize the out-of-pocket expenditure, calculated as a percentage of the current health expenditure, especially for the elderly populations, as well as people with chronic conditions (which is also directly correlated with age). One of the goals that Egypt seeks to accomplish within the following years is universal health coverage to ensure that everyone will have access to quality health services—owing to the prevalence of the current healthcare system, which relies mainly on non-uniform out-of-pocket payments (more than 60%). Moreover, only half of the elderly population suffering from chronic conditions have health insurance. That's not a coincidence, as those are one of the classes in great need of reaching that. To allocate health resources properly, we first need to understand how serious the problem is by analyzing how much the population spends on health. Accordingly, we need to gain insight into the catastrophic payment on health. This paper aims to investigate the incidence and intensity of catastrophic health payments for the population using the WHO (2005) method. Moreover, the paper will estimate the pattern of the catastrophic health expenditure in specific subgroups, such as the groups of annual household (H.H.) Expenditure, Sex of the H.H head, Number of elderlies living in a Household, and the region of H.H. Head. The data used in this study is from the Egypt, Arab Rep. - Household Income, Expenditure, and Consumption Survey, HIECS 2017/2018 – 2017. Additionally, we analyzed the data of 12485 households representing the whole of Egypt. The results of the study indicate that more research should be conducted to assess the factors and outcomes of catastrophic health expenditure. Furthermore, while vulnerable groups such as low-income families, the elderly, and families in rural are in the greatest need, they still can't afford adequate health services. Finally, even if universal health insurance did not work or was further delayed, specific programs should be tailored for such groups to socially protect them and accomplish the goal of "health for all".

Estimation from sparse, multiply-imputed, multiway tables

Format: CPS Abstract

Author: Dr Patrick Graham

Multiple imputation is a standard method for dealing with missing values in data. Analysis of multiply imputed datasets is usually straightforward using well-known formulas for combining estimates from each of the multiply imputed datasets. However, the normality assumption that underpins the combining formulas is suspect for the analysis of multiway tables with many small proportions or counts. This can lead to confidence intervals with markedly  less than nominal coverage. In this paper we present an approach to obtaining valid confidence intervals for sparse multiply imputed multiway tables.

Estimation of Production Function for Manufacturing Subsectors in Malaysia

Format: CPS Abstract

Author: Mrs Nur Aisyah Ruslee @ Ahmad Azam

Co-Authors:

  • Rozita Misran
  • Nurul Hidayah Ismail

Manufacturing is Malaysia's second-largest sector, accounting for 24.3 percent of the country's GDP in the second quarter of 2022. With many investments and new factories being set up, the sector continues to contribute to the nation’s export earnings and employment opportunities, ensuring the country’s growth despite global economic uncertainty. Recent pandemic has been a challenge to the country, leaving everyone to adapt to new standard’s, technology and digitalisation. As Malaysia approaches the digital economy and Industrial Revolution 4.0, the government need to drive the nation with the right input and resources such as capital and labour in order to maintain the manufacturing productivity. Hence, using Cobb-Douglas production function, this paper will study the significance of capital and labour for eight subsectors of the manufacturing industry in Malaysia and examine the relationship between labour, capital and output of the subsectors. Data from surveys by the Department of Statistics Malaysia will be used in this study and in order to estimate the production function, the Least Square (LS) and robustness check methods are used.

Estimation of Treatment Effects for Multiple Outcomes by using Pliable Lasso

Format: CPS Abstract

Author: Dr Shintaro Yuki

Co-Authors:

  • Kensuke Tanioka
  • Hiroshi Yadohisa

We are dealing specifically with two-arm comparisons, where subjects are assigned to the treatment or control group. In two-arm comparisons, it is desirable to efficiently identify populations with characteristics that make the treatment effective in so-called subgroups. Estimating the treatment effect can be used to identify them. In this presentation, we focus on estimating treatment effects using a linear regression model with the Pliable lasso for straightforward subgroup interpretation. The Pliable lasso is a method for estimating the effect of a treatment on a single outcome. We propose an extension of this method that can be applied to multiple outcomes.

Estimation of Treatment Effects with Missing Observations in Crossover Clinical Trials

Format: CPS Abstract

Author: Gajendra Kumar Vishwakarma

The statistical analysis in presence of missing data in any study is challenging. It gets more attention since last few years for clinical trials. There are several reasons for the occurrence of missing data in the crossover trial. However, attempts toward crossover trial data are negligible. In Bayesian analysis, it becomes feasible to perform the causal effect relation jointly with imputation. This work is dedicated towards development of missing data handling technique for crossover clinical trials using Bayesian context.

Estimation of causal effect of cesarean section delivery on body mass index for Bangladeshi women

Format: CPS Abstract

Author: Dr Mahbub Latif

Pregnancy related factors play an important role in mothers' health condition, specially in the later stage of their reproductive period. One of such factors, cesarean section (CS) delivery, a life-saving procedure for the mother and newborn, could affect mothers' later-life health as it involves a major surgical procedure. It was reported in many studies conducted with Bangladeshi survey data that over the last 15 years, the rate of cesarean section (CS) delivery is increasing with that of the institutional delivery. The prevalence of CS delivery is found to be significantly higher in Bangladesh than the comparable standard rate suggested by the World Health Organization. The alarming high rate of CS is an important public health concern for Bangladesh as it affects not only the women's later-life health condition, but also contributes significantly to the country’s total health expenditure. A number of studies are found in the literature on identifying important factors associated with CS and also the consequences of CS on maternal health and household expenditure. Among the factors that could be affected by CS delivery, women BMI is considered in this study as the prevalence of obesity is also increasing among Bangladeshi women of older reproductive age groups. The main objective of this study is to estimate the causal effect of CS delivery on body mass index (BMI) for Bangladeshi women from a nationally representative survey data obtained from open sources (e.g. demographic health surveys, multiple indicator cluster surveys, etc.). Estimating causal effect from an observational study requires adjusting associated confounders, and different propensity score and IP weighting based methods are used for estimating the causal effect of CS on BMI. Women and their husband's education level, wealth index, number of ante-natal care visits during pregnancy, total number of ever-born children, and place of residence are considered as covariates for the treatment model. The analysis shows a significant causal effect of cesarean section on BMI, more specifically the odds of being obese is found to be about 13 percent higher among women who had cesarean section compared to others.

Estimation of proportions in very small population domains

Format: CPS Abstract

Author: Andrius Čiginas

Co-Authors:

  • Ieva Burakauskaite

A small area estimation topic.

Ethical Principles for the Data Science Revolution: Repurposing Administrative and Opportunity Data for Social Science Research

Format: CPS Abstract

Author: Dr Stephanie Shipp

Data science teams bring together researchers and application stakeholders across many areas of expertise, each with its own set of research integrity norms and habits. This requires ethics to be woven into every aspect of doing data science. Our Data Science Ethics framework reinforces this as data science ethics touches every component and step in the practice of data science.

Evaluating Food security by a composite indicator

Format: CPS Abstract

Author: Dr SAID SAGHIR ZAROUALI

Co-Authors:

  • AMAL MANSOURI

Eradicating hunger and malnutrition is one of the great challenges that requires the implementation of strategies to strengthen the food security and develop a statistical monitoring, review and surveillance processes, in particular with increasing frequency of economic crises and social conflicts. Many countries and international organizations have favoured a pragmatic approach based on the elementary indicators, supposed to reflect some dimensions of food security. However, the development of composite indicators, calculated by aggregating heterogeneous elementary indicators of security food, is necessary in order to provide a synthetic vision over time and lead to better policies and decisions than would have been the case without it. The paper present two methods to estimate a composite indicator of Food security; one based on factorial analysis and the other from unit ranking of the six normalized indicators, allowing clear understanding of Food security trends and drawing comparisons between 21 selected countries including Morocco. The five indicators selected are the food availability per capita, GDP per capita, political stability and absence of violence and terrorism and variability of food production per capita. The study was conducted over the period 2002-2019. The results indicate that the composite indicators calculated from the two methods show almost similar trend changes. With the exception of Nigeria and Jordan, which experienced a deterioration of 0.2 and 0.18 points in their food security indices, all the other countries in the sample showed continuous improvement in their food situation. The results also present also a reverse synchronization with “prevalence of undernourishment” and of the “Scale of experience of food insecurity (FIES)”. We conclude with the hypothesis that to end hunger or manage food crises, the central vector for all composite indicators of food security remains the improvement of food availability per capita. In these situations, it is necessary to increase the strength of financial support to the agricultural sector, improving its production and diversification capacity. It is also important to establish an effective long-term mechanism to protect agricultural production and reduce the constraints of cropland and water resources linked to climate change, especially in vulnerable areas and countries.

Evolution of the Transportation Sector in Malaysia

Format: CPS Abstract

Author: Ms K Megala Kumarran

Co-Authors:

  • Nivethan M. Mariappan
  • Nurul Nahzira Nur

This paper describes on the evolution of transportation system of Malaysia. Which in early stages the primary transportation was enforced to interact with the involving larger cities in the rubber and tin mines. Now, with the evolution of transportation system, we could see railway system in Malaysia has expanded from steam train in the beginning to electric train in Keretapi Tanah Melayu Berhad (KTMB). Furthermore, now Malaysian government working with the Singapore government with East Cost Railway Link (ECRL), a mega project, where both countries can be concentrated by land and the project expected to be completed in the year 2027. As we talk about Malaysia Gross Domestic Product (GDP) 2021, the main contributor is the sector which contributed about 57.0 per cent. As we zoomed into services sector, the transportation sub-sector is the third largest contributor for Malaysia GDP. The transportation sub-sector amounted RM42.7 million (3.1%) to the services sector. Therefore, this paper aims to analyse the growth, progress, and the evolution of transportation system for the past and the revert 10 years.

Examining the relation between Climate change and labor force in Egypt

Format: CPS Abstract

Author: Mr Said Ahmed Said

The phenomenon of “climate change” is defined as an imbalance in the usual climatic conditions such as temperature, wind patterns, and precipitations that distinguish each region on Earth. The frequency and magnitude of long-term global climate changes have enormous impacts. On natural vital systems, increasing temperatures will also lead to a change in weather types such as wind patterns and the amount of precipitation and its types, in addition to the occurrence of several possible extreme weather events; this leads to environmental and social consequences. The temperature of the Earth's surface has recorded a steady increase during the past 100 years, ranging between 5.0 - 7.0 degrees. As human activities represented by the industrial and technological revolution led to an increase in the rate of gas emissions. Global warming and increasing its concentrations in the atmosphere.This study examines the relationship between the major Labor force variables such as average working hours, average wages, furthermore health status represented in cases caused by death due to respiratory diseases, and three main climate variables Carbon dioxide emissions, Methane emissions, and Nitrous oxide emissions which potential effects of on the work and health in Egypt. The study used a method based on the effect of environmental indicators and gas emissions on the Egyptian labor force. The main source of data in this study is the annual bulletin of employment, wages, and working hours statistics annual bulletin of deaths statistics, and World Bank data. This study uses time series data for the 1990-2019 period at an aggregate level in Egypt to access the relationship between climate variables and working conditions by using dynamic Regression. Moreover, analysis is also used to identify the most important variables of climate change that affect participation in the labor force and health.

Exploratory Study on the Factors Associated with Data Literacy: Evidence from TIMSS 2019 Grade 4 Philippine Assessment Data

Format: CPS Abstract

Author: Dr Kevin Carl Santos

Co-Authors:

  • Karizza Bianca Loberiza

One of the mathematics content domains assessed in The Trends in International Mathematics and Science Study(TIMSS)2019 is Data, consisting of two areas, namely, reading, interpreting and representing data, and using data to solve problems. In the Philippines, the mathematics curriculum of the Department of Education includes statistics and probability as early as Grade 1. When the learners reach Grade 4 level, they should have learned how to create and interpret simple representations of data such as tables and bar graphs. Recognizing the role that data literacy plays in our information-driven society, it is important for students to understand how data displays such as graphs and charts help in organizing information, and comparing data. For this reason, this study aims to identify the factors associated with the proficiency levels of the Filipino Grade 4 learners in data literacy in TIMSS 2019. To address this, the relationship of several contextual variables including student- and school-level data with data achievement scores are analyzed using multilevel models with school membership considered as random effects. Based on the results, recommendations are made on what possible interventions and policies can be crafted to improve the data literacy of Grade 4 learners.

Exploratory analysis of hunger, housing and vulnerability, pobreza e COVID-19

Format: CPS Abstract

Author: Dr PAULO DE OLIVEIRA

Food insecurity is defined as the lack of access by people to food in a regular and healthy way, so that the failure to guarantee this right ends up affecting other essential needs such as cultural diversity, sustainability from an environmental, economic and social point of view. together with other factors such as poverty, worse housing conditions, vulnerability and recently with the emergence of the COVID-19 pandemic. help improve the quality of life for these people.

Exploring Career Stagnation in Employment Equity Groups Amongst Canadian Public Servants

Format: CPS Abstract

Author: Mx Catalina Albury

Co-Authors:

  • Brittny Vongdara
  • Shamir Kanji
  • Martin Nicholas

Public service sectors aim to represent the interests of a country’s population in policy and decision making and should therefore act as leaders in racial equity in employment. In recent years, institutions globally have increasingly recognized that equitable representation of historically excluded groups is essential to healthy and productive workplaces. However, contemporary metrics for determining inclusion in employment amongst racialized groups can fail to identify barriers to entry and career advancement and the quality and depth of the necessary data begs improvement. Additionally, the nuances of measures of diversity and equity must be recognized. Many use aggregated diversity data as a “catch-all” for representation metrics. Persons belonging to historically excluded groups may enter a sector via the improvement of hiring practices, yet, they often remain disproportionally underrepresented in positions of leadership (a lapse in equity). Increased availability and analysis of disaggregated employment data allows employers and employees to quantify equitable representation amongst historically excluded groups.  Here, median values, a quantile analysis and the definition of a disproportionality index are used as evaluation metrics.

Factors affecting students’ achievement in Mathematics and statistics in secondary schools and its influence on studying statistics”. A rapid review

Format: CPS Abstract

Author: Ms Lillian Ayebale

Co-Authors:

  • Gilbert Habaasa

Introduction: Mathematics is seen by society as the foundation of scientific technological knowledge that is vital in social-economic development of a nation. In fact, studies suggest that mathematics as a subject affects all aspects of human life at different levels. This study is a rapid systematic review of different studies to establish factors affecting students’ achievement in Mathematics and statistics in secondary schools and its influence on studying statistics at university. Methods: This paper is a scoping rapid review of factors affecting students’ achievement in mathematics. We searched literature on student achievement in mathematics. We used ERIC database and supplemented with Google Scholar and random Google search. Twenty-seven articles met the final selection criteria and were reviewed. Results: The teaching methods, teachers’ attitude, students’ attitude towards mathematics were noted as key factors in almost all articles reviewed. There seemed to be consistency too that parents can exert a positive influence on their children’s mathematical performance, classroom environment, students’ previous mathematics achievement and gender related factors. Conclusions: Student achievement at secondary level determines whether they will opt to or qualify to study statistics at university. From this review, it is imperative that these factors be addressed early in the students’ career so as to have more student enrollment for statistics at tertiary institutions.

Fast and efficient approach for combining satellite imagery and multiple sources of data for improved population estimates

Format: CPS Abstract

Author: Dr Chibuzor Christopher Nnanatu

This is a contributed paper presentation on an alternative fast and efficient method for combining multiple population data along with satellite imagery and other geospatial covariates to predict population density within a given area or country. The method adapts a Bayesian Hierarchical regression framework.

Fertility among adolescents in sub-Saharan Africa: A systematic review

Format: CPS Abstract

Author: Ms Lillian Ayebale

Globally, there is reported decline in adolescent birthrate (UNFPA, 2016). However, the World Fertility report 2017, indicates that adolescent fertility is still high in many developing countries (United Nations, 2019). In sub-Saharan Africa, the adolescent fertility rate is reported at 104 births per 1000 women aged 15–19 years (UN DESA, 2019), with a missed estimations for young adolescents 10-14 years. The UN indicates that fertility among adolescents 10-14years have been underestimated as they are perceived as young and much emphasis for Age specific fertility rate is on 15-19years (UN DESA, 2019). Also, birth among under 15-year-old likely to be concealed to avoid shame and stigmatization in society (Maly et al., 2017). To address issues related to adolescent fertility in a comprehensive manner, the international community has recognized the importance of monitoring fertility levels among girls aged 10 to 14 years, in addition to ongoing surveillance of birth rates at ages 15-19 years (UN DESA, 2019). Although there have been several studies on adolescent fertility in sub Saharan Africa, limited studies have focused on synthesizing available evidence of correlates of adolescent fertility, to inform policies and programs for adolescent. This paper is a systematic synthesis of available evidence on correlates of fertility among adolescents 10-19 years in sub-Saharan Africa.

Financial disruption and big data: complexity and interaction

Format: CPS Abstract

Author: Mrs Rim Jellal

The use of Big Data is increasing in the financial system. Financial decisions based on massive data (Big Data) will be the new challenges in terms of the resilience of financial systems. The development of statistical and econometric tools to drive and monitor this change in information systems is crucially necessary. As such, statistical techniques in the universe of big data must be put in place in order to be able to assess the robustness of the financial system and put in place a new typology of financial regulation to complete those macro and micro prudential. In this paper, we propose to assess the potential interactions between the degree of resilience of the financial system and the existence of big data. The results obtained attest to the fact that big data can constitute a shock amplification mechanism that can lead to the identification of new shock transmission channels. Big data in this perspective can lead to a new form of myopia in finance.

Finding households and dwellings in a register-based census using graphs

Format: CPS Abstract

Author: Mrs Helle Visk

The 2021 census in Estonia was mostly based on administrative data. The inaccuracy of place of residence data in Population Register causes strong biases in household and family statistics because family members are registered on different addresses. To reunite the families, Statistics Estonia developed a graph-based method.  We consider people and addresses as nodes of a graph; the connections between them define the edges. For example, husband and wife are connected through marriage, real estate is connected with its owner. Then, a household and its dwelling can be viewed as a strongly connected subgraph. Constructing households and finding dwellings for them reduces to the community detection, a common task in the analysis of networks.

Finite Mixture Models for an underlying Beta distribution with an application to COVID-19 data

Format: CPS Abstract

Author: PROF. DR. Jang Schiltz

Co-Authors:

  • Cédric Noel

We introduce an extension of Nagin's finite mixture model to underlying Beta distributions and present our R package trajeR which allows to calibrate the model. Then, we test the model and illustrate some of the possibilities of trajeR by means of an example with simulated data. In a second part of the paper, we use this model to analyze COVID-19 related data during the first part of the pandemic. We identify a classification of the world into five groups of countries with respect to the evolution of the contamination rate and show that the median population age is the main predictor of group membership. We do however not see any sign of efficiency of the sanitary measures taken by the different countries against the propagation of the virus.

Flexible dissemination software for the 2021 England & Wales Census

Format: CPS Abstract

Author: Mr Mike Thompson

We present a case study of the implementation of a flexible dissemination service for the Office for National Statistics and the Northern Ireland Statistics and Research Agency in the UK for use with both the England and Wales census and the Northern Irish census.

Food sovereignty in times of crisis: challenging the statistical system

Format: CPS Abstract

Author: Dr SAID SAGHIR ZAROUALI

The world has seen significant increases in food prices over the past three years, caused by a combination of factors, including the outbreak of war in a region known for its large grain and oilseed production potential, restrictions limiting the flow of goods caused by the outbreak of the COVID 19 pandemic, scarcity and poor distribution Rainfall mainly due to climate change, the extension of drought to wetlands and the significant frequency of blowing years in the regions arid and semi-arid. At the same time, global demand for food continues to grow and social and regional inequalities are on the rise. As a result of all this, an increase in the poor population is expected and the emergence of aspects of undernutrition and malnutrition among the population, especially among vulnerable groups of the population (children, elderly , pregnant women, etc.). In this global turmoil, food security, which was easy in rich countries, is no longer possible today. In fact, the issue of food security has become a problem for both rich and poor countries. However, the food sovereignty of countries has come back with force after having been neglected for years, in particular with the return of the confrontation between the great powers and the emergence of the will to control the world balance. Deficit in the supply of basic food products to world markets, which constitutes a factor of geostrategic pressure. This session offers a study of statistical approaches to assess aspects of food sovereignty in a world in crisis. In fact, the aim is to analyze the main statistical indicators to describe the state of food in the world and to focus on a few specific cases. In this sense, the session invites experts in the field of food security representing international bodies and United Nations agencies.

Forecasting Malaria Morbidity to 2036 Based on Geo-Climatic Factors in the Democratic Republic of Congo

Format: CPS Abstract

Author: Mrs Eric Panzi

Abstract Background: Malaria is a global burden in terms of morbidity and mortality. In the Democratic Republic of Congo, malaria prevalence is increasing due to strong climatic variations. Reductions in malaria morbidity and mortality, the fight against climate change, good health and well-being constitute key development aims as set by the United Nations Sustainable Development Goals (SDGs). This study aims to predict malaria morbidity to 2036 in relation to climate variations between 2001 and 2019, which may serve as a basis to develop an early warning system that integrates monitoring of rainfall and temperature trends and early detection of anomalies in weather patterns. Methods: Meteorological data were collected at the Mettelsat and the database of the Epidemiological Surveillance Directorate including all malaria cases registered in the surveillance system based on positive blood test results, either by microscopy or by a rapid diagnostic test for malaria, was used to estimate malaria morbidity and mortality by province of the DRC from 2001 to 2019. Malaria prevalence and mortality rates by year and province using direct standardization and mean annual percentage change were calculated using DRC mid-year populations. Time series combining several predictive models were used to forecast malaria epidemic episodes to 2036. Finally, the impact of climatic factors on malaria morbidity was modeled using multivariate time series analysis. Results: The geographical distribution of malaria prevalence from 2001 and 2019 shows strong disparities between provinces with the highest of 7700 cases per 100,000 people at risk for South Kivu. In the northwest, malaria prevalence ranges from 4980 to 7700 cases per 100,000 people at risk. Malaria has been most deadly in Sankuru with a case-fatality rate of 0.526%, followed by Kasai (0.430%), Kwango (0.415%), Bas-Uélé, (0.366%) and Kwilu (0.346%), respectively. However, the stochastic trend model predicts an average annual increase of 6024.07 malaria cases per facility with exponential growth in epidemic waves over the next 200 months of the study. This represents an increase of 99.2%. There was overwhelming evidence of associations between geographic location (western, central and northeastern region of the country), total evaporation under shelter, maximum daily temperature at two meters altitude and malaria morbidity (p &lt; 0.0001). Conclusions: The stochastic trends in our time series observed in this study suggest an exponential increase in epidemic waves over the next 200 months of the study. The increase in new malaria cases is statistically related to population density, average number of rainy days, average wind speed, and unstable and intermediate epidemiological facies. Therefore, the results of this research should provide relevant information for the Congolese government to respond to malaria in real time by setting up a warning system integrating the monitoring of rainfall and temperature trends and early detection of anomalies in weather patterns.

Formal privacy on a subset of dataset variables

Format: CPS Abstract

Author: Ms Pin Lin Tan

Differential privacy (DP) is a mathematical definition of privacy that has many attractive properties. However, it requires all outputs from the dataset to be noisy, even those that only involve variables that are not considered private. In this paper, we define privacy definitions that only protect a subset of the variables in a dataset, thus allowing statistics only involving unprotected variables to be released accurately. We explore their relation to DP and Pufferfish privacy, their composition properties and how it can be used to assign an epsilon value to each variable.

From skepticism to conviction: The emerging statistical methodologies in integrating satellite and reanalysis data with station data.

Format: CPS Abstract

Author: Prof. Bashiru I.I. Saeed

Co-Authors:

  • Caleb Nambyn
  • Ebenezer Tawiah Arhin
  • Amidu Abdul Hamid

Only one-eighth of the minimum number of weather stations that the World Meteorological Organization recommends are present throughout Africa, which creates a major data gap in dozens of nations that are among the most vulnerable to climate change. Though generally susceptible to restrictions, ground-based stations are sometimes regarded as the gold standard for meteorological data. In general, it can be challenging to use climate data when it is inconsistent, has short records, missing values, is unreliable, or is mostly found in hardcopy manuscripts. Forecasts have been erroneous due to a lack of data, and early warning systems may not even exist.

Fusing areal and point level survey data to map childhood vaccination coverage

Format: CPS Abstract

Author: Dr Chigozie Utazi

High resolution maps of health and development indicators are mainly produced using modelling approaches utilizing point-referenced data which are typically sourced from household surveys. However, in vaccination coverage estimation, spatially misaligned multiple input data sets can sometimes be available to model a coverage indicator of interest. Here, we focus on the case where both point-level data on vaccination coverage from a survey and routine areal data on disease counts are available for analysis and propose a fusion model that combines information from both datasets to map an indicator of vaccination coverage at a high resolution. The proposed model is a combination of a conditional autoregressive model for the areal data and a Gaussian process model for the point level data, with Poisson and binomial likelihoods specified for both outcomes, respectively. The melding of both spatial scales in the model is accomplished using the components of the linear predictor. The model is fitted in a Bayesian framework using the INLA-SPDE approach and applied to mapping the coverage of measles vaccination in Nigeria using the 2018 Demographic and Health Survey (DHS) data and measles case counts. The predicted coverage maps reveal that combining information from both data sources leads to better identification of low coverage areas compared to coverage maps produced using only geolocated survey data as is usually the practice.

Fuzzy matching on big-data: an illustration with scanner and crowd-sourced nutritional datasets

Format: CPS Abstract

Author: Mr Lino Galiana

Co-Authors:

  • Milena Suarez Castillo

Food retailers' scanner data provide unprecedented details on local consumption, provided that product identifiers allow a linkage with features of interest, such as nutritional information. In this paper, we enrich a large retailer dataset with nutritional information extracted from crowd-sourced and administrative nutritional datasets. To compensate for imperfect matching through the barcode, we develop a methodology to efficiently match short textual descriptions. After a preprocessing step to normalize short labels, we resort to fuzzy matching based on several tokenizers (including n-grams) by querying an ElasticSearch customized index and validate candidates echos as matches with a Levensthein edit-distance and an embedding-based similarity measure created from a siamese neural network model. The pipeline is composed of several steps successively relaxing constraints to find relevant matching candidates.

Gender Inequalities in the burden of Non-Communicable Diseases in Botswana: A Differences-in-Decomposition Approach

Format: CPS Abstract

Author: Dr Naomi Setshegetso

This paper demonstrates how the pattern of susceptibility to, or the burden of non-communicable diseases (NCDs) and mortality arising thereof is very similar to the patterns of unemployment and poverty in Botswana. In Botswana, NCD accounts for 52 percent of mortality followed by communicable diseases (43 percent) and injuries (5 percent) in Botswana. Available data also estimate that 23 percent of females are overweight and 19 percent are obese compared to 14 percent and 5 percent of males respectively. Taking the risk factors of NCDs as an aggregate, 28 percent of females compared to 24 percent of males have three or more risk factors.

Gender wage discrimination in urban areas in Morocco

Format: CPS Abstract

Author: Mr Khalid Soudi

Co-Authors:

  • Hicham El Marizgui

The evaluation of wage discrimination has the merit to constitute a kind of revealing indirect of the width of the general phenomenon of discrimination between men and women. If, in the best of the cases, this evaluation can up to what point indicate the principle "equal work, equal wages" is not respected, it is obvious that it cannot explain the totality of the wage difference between the two sexes. Differences in terms of human capital and other observable individual characteristics, the unequal access to certain categories of employment, stability in work, etc. are as many factors which explain the differences in wages between men and women.

Generalized Bayesian inference via composite likelihood for population dynamics models

Format: CPS Abstract

Author: Dr Sofia Ruiz Suarez

Co-Authors:

  • Radu Craiu

In some modern applications the requirement to define the true sampling distribution, or likelihood, becomes a challenge. Sometimes it is impossible or very expensive to evaluate the likelihood, and is in those cases where the use of generalized Bayesian inference has proven to be useful. Generalized Bayesian inference updates prior beliefs using a loss function, rather than a likelihood is known, the procedure coincides with the original Bayesian updating. Stochastic population dynamics models constitute an important component of applied and theoretical ecology. Statistical inference for these models can be difficult when, in addition to the process error, there is also sampling error. Ignoring the presence of sampling variability can lead to biases in the estimations, resulting in erroneous conclusions of the system behavior. The Gompertz model is well known for its use to describe the growth of animals, plants or cells, and it is possible to adapt it  in order to consider sampling variability. However,  when the sampling error is explicitly considered,  the exact likelihood formulation becomes further complicated as it involves a T-dimensional integral. In this work, we discuss the use of the composite likelihood as a loss function under the generalized Bayesian inference approach for estimation of the parameters of the Gompertz model in the presence of sampling variability. To obtain a pseudo- posterior distribution we propose a Metropolis-Hasting algorithm that computes an approximation of the composite likelihood in each Markov chain step. It is possible to compute the composite likelihood as a product of 1,2,..., or n dimensional- marginal distributions. Although the calculation of the dimension formulation is easier and faster, as it does not include all the parameters, it implies a 2-step inference procedure, leading to error propagation. We consider both, one and two dimensions formulations, and using simulations we test and compare the ability of this approach to recover the real parameters of the model. .

Give me Facts, Not Data: Florence Nightingale’s Pursuit of Truth in India

Format: CPS Abstract

Author: Prof. Nimai Mehta

Co-Authors:

  • Mary Gray

Florence Nightingale’s pioneering contributions in the collection and analysis of statistical data to support evidence-based decision making for the public good is well known. What remains less known, and thus unacknowledged by the statistics profession, is the priority Nightingale accorded to the collection of “facts”, or observed reality, over data and statistics. In her later years, Nightingale undertook a remarkable correspondence project with a young Indian lawyer in an attempt to gather “facts” – as opposed to data - on the state of tenant farmers and tenancy reforms in late-nineteenth century British India. Their correspondence, sustained over a four-year period, between 1878 to 1882, was published later as her “Indian Letters”. We explore these letters to clarify the Nightingale distinction between facts and data. Her skepticism of data/statistics and her insistence on facts in the above case was partly motivated by her belief that the latter, more than the former, would be needed to convince her British audience, including the Parliament who she undertook to lobby on behalf of the Indian peasants suffering under the existing set of land and tenancy rights in Bengal. We argue, however, that the Nightingale distinction and her skepticism of data reveal insights that have remained relevant for both statistical practice and policy. We point to the overestimation of Soviet growth data and potential that continued to plague the West’s assessment of the USSR as late as the 1980s. In addition, we provide insights from our more recent attempts to assess the quality of data used by the Government of Myanmar against the “facts” that we observed in the field. Finally, we argue that Nightingale-facts are an important check against the inherent limitations of data collection, modelling biases, and their use to inform public policy.

Gradient boosting models applied in landslide susceptibility mapping

Format: CPS Abstract

Author: Alyssa Gao

Co-Authors:

  • Han Gao
  • Pei Shan Fam
  • Lea Tien Tay
  • Heng Chin Low

Landslide susceptibility analysis (LSA) is a popular and effective way to determine the possibility of landslide occurrence in a specific area, and further reduces the losses. This study sims to improve the landslide spatial prediction in Penang, Malaysia using gradient boosting models, eXtreme Gradient Boosting (XGBoost) model and Light Gradient Boosting Machine (LightGBM), combined with the oversampling techniques. The results are analyzed and discussed mainly based on receiver operating characteristic (ROC) curves as well as the area under the curves (AUC). The results show that the highest AUC value of 0.9525 is obtained from the combination of XGBoost and SMOTE. The landslide susceptibility maps (LSMs) produced by XGBoost and LightGBM can provide valuable information in landslide management and mitigation in Penang Island, Malaysia.

HEALTH SURVEILLANCE IS A MATTER OF GOOD “CENSUS”: AN ANALYSIS OF THE IMPORTANCE OF USING HEALTH INDICATORS IN AN EMERGENCY CARE UNIT (UPAS) IN SEROPÉD

Format: CPS Abstract

Author: Mrs Débora Soares

This article discusses the importance of territorialization in health for the management and prevention of COVID-19 in the area of operation of the Emergency Care Unit (UPAs) of Seropédica in Rio de Janeiro. The adverse panorama of the pandemic, which did not present anything new, brought to light the precariousness of environmental sanitation in the country. In this sense, the objective of this work was to identify the degree of social inequalities in the area of action of this UPA of Seropédica, during the COVID-19 pandemic, and to analyze whether these areas are of inadequate sanitation. A quantitative survey was carried out using the health indicators of the Synthesis Panel by Municipality developed by the Brazilian Institute of Geography and Statistics (IBGE), and the Social Inequalities Index for Covid-19 (IDS-COVID-19) prepared by the Center for Integration of Data and Knowledge for Health (CIDACS). Finally, this work reflects that the indicators present a high degree of social inequalities for the municipality of Seropédica. As well as that the pandemic is not being the same for everyone. Death by COVID-19 has color!

Have innovative machine learning techniques boosted official statistical production in Switzerland?

Format: CPS Abstract

Author: Dr Jean-Pierre RENFER

The Swiss Federal Statistical Office (FSO) adopted a data innovation strategy that included the launch of innovation pilot projects to be evaluated in this paper. We propose to describe these projects by focusing on their expected added value, the problems encountered and the solutions provided. The projects presented all implement techniques from artificial intelligence such as machine learning or deep learning for image or text recognition. In this context, the criteria for approving a transfer to statistical production will be discussed to answer the question asked in the title!

Health Financing Risk Determinants on Indonesia's National Health System According to Smoking Status Approach

Format: CPS Abstract

Author: Mr Yusuf Fuadi

Co-Authors:

  • Suryo Adi Rakhmawan

The relationship of smoking status and BPJS Health Membership to the Risk of Health Financing Determinant

Health accelerator and financial frictions in macroeconomic modelling: how to modelling pandemic effect?

Format: CPS Abstract

Author: Prof. Firano Zakaria

Co-Authors:

  • Firano Zakaria
  • fatine filali Adib

In this paper, we have developed a new approach to macroeconomic modelling by proposing to introduce the agent’s behaviour in a pandemic situation. In the form of health frictions that alter the economic agent’s behaviour in crisis situations, we have set up a DSGE model whose behavioural functions take into account the existence of healthy and infected populations. The novelty of this work is the inclusion of the two frictions; health and financial, the idea is to succeed in describing the macro-financial dynamics in a pandemic situation. The results obtained confirm the existence of a health accelerator that amplifies macroeconomic shocks.

High-dimensional Partially Linear Additive Models on Lie groups

Format: CPS Abstract

Author: Mr Changwon Choi

Co-Authors:

  • Zhenhua Lin
  • Byeong Uk Park

We propose an extension of the partially linear additive model on Lie groups, such as the space of symmetric positive-definite matrices with the Log-Euclidean or Log-Cholesky metrics, and develop a semiparametric regression approach on Lie groups. Moreover, we investigate two different high-dimensional regression methods for the linear components. Our methods use profiling techniques to estimate the components. We show that the proposed variable selection method has selection consistency and the estimators of parametric components have the oracle property. Also, our nonparametric estimators achieve the convergence rate of univariate nonparametric models. We provide several simulation studies and real data analysis to evaluate the numerical performance of the proposed methods.

Households Food Insecurity Under COVID-19 in Indonesia

Format: CPS Abstract

Author: Annisa Nur Fadhilah

Co-Authors:

  • Atika Kautsar Ilafi

The COVID-19 pandemic has disrupted the achievement of Sustainable Development Goals (SDGs) number two, achieving food security, improved nutrition, and promoting sustainable agriculture. This goal remains one of the most rigorous SDGs the United Nations wants to achieve by 2030. The target is to achieve universal access to safe and nutritious food and end all forms of malnutrition. However, the phenomenon of the Covid-19 pandemic has caused world economic instability caused by implementation of lockdown and semi-lockdown policies, and indirectly has an impact on the prevalence of food insecurity in a country, including Indonesia. In the second quarter of 2020, Indonesia experienced an economic contraction of minus 5.32% compared to the second quarter of 2019. In the second quarter of 2020, the highest contributors of GDP, household consumption and investment, contracted by -1.38% and -3.47%, respectively, while economic activity related to the global economy, exports, experienced the deepest economic contraction minus 5.79 % (Statistics Indonesia, 2020). In addition, the prevalence of food insecurity experience scale (FIES) of Indonesia has a slow declined compared to the previous year, where the declined before pandemic was around 20%, while the declines in 2020 and 2021 was around 6%. This study aimed to discuss the relationship between household socioeconomic factors and food insecurity in Indonesia during Covid-19 pandemic using National Social and Economic Survey March data and analyzed with binary logistic regression model. The goodness of fit test using Hosmer and Lemeshow’s test and partial test using a significance level of 5% is used to test the model, and it shows that the p-value is about 0.084, and it is more than significance level of 5%. Besides that, there are approximately 18,6% of household in Indonesia with food insecurity status. Variables that have high tendency to be food insecure is households that received social protection and households with more than four members.

How Statistical Offices and Statisticians can be Anti-Racist

Format: CPS Abstract

Author: Dr David Alan Marker

The American Statistical Association's (ASA) Anti-Racism Task Force spent almost two years reviewing the organization's internal documents, reviewing our connection with the entire statistical profession, and examining the impact of statistics on society. The Task Force proposed dozens of recommended actions for the ASA and its members to take to become a strong force to fight racism. The ASA has hired a consultant to assist them in their efforts. Attempts to use statistics to fight racism in New Zealand and the United Kingdom will also be discussed. This presentation will review the findings and identify actions that could be taken to help us make progress in using statistics for positive, not negative impacts on the society in which we live.

How did the statistics help in the growth of exports and achieve the highest profit margin.

Format: CPS Abstract

Author: Mr MAHMOUD ABDEL-FATTAH

How did I benefit from the science of statistics in making money and achieving the highest profit margin? For example, on today's date 11/20/2022. 1 euro equals 25 Egyptian pounds, while one kilo of oranges equals 5 Egyptian pounds in Egypt and in Italy, one kilo of orange equals 2.5 euros, which is equivalent to 62 Egyptian pounds, i.e. the price of oranges in Italy is 12 times the price in Egypt. And through agricultural statistics and primary data, and using statistical methods, the volume of consumption was estimated from each type of agricultural crop, and I had the ability to predict the prices of items that would be cheap before they were put on the market, and therefore I had the precedence in contracting to export cheap goods to Europe, before they were offered In the markets in Egypt and before other exporters contract with them and increase their price in the markets because it is known that the rule of supply and demand is what governs prices. Thus, the highest profit margin was achieved. This study will present how the statistical methods were used in estimating the volume of consumption and predicting prices, based on the primary data of agricultural statistics in Egypt.

How financially protected are Filipino senior citizens?: Key insights and takeaways from econometric exercises

Format: CPS Abstract

Author: Mr Christian Mina

Co-Authors:

  • Faith Christian Q
  • Nathaniel E. De Leon

The life cycle hypothesis posits that people save during their working years to accumulate resources in support of their habitual consumption during retirement (Modigliani and Brumberg, 1954). It is thus interesting to find out whether Filipino senior citizens (aged 60 years old and over) who are no longer part of the labor force are financially protected in terms of contributory or social pension coverage. Using data from the nationwide household-based Consumer Finance Survey and applying logistic regression and multiple correspondence analysis, this study also looks at salient characteristics of households of these senior citizens to determine which may have potentially greater financial safety net for their old, dependent and/or non-working members. Further, given the Philippines’ performance in the latest Mercer Chartered Financial Analysts (CFA) Institute’s Global Pension Index, the study will attempt to estimate the gap between the pension benefits received or are being received by senior citizens vis-à-vis their minimum consumption requirements. Based on the key findings from econometric analyses, the study will propose an alternative retirement benefit scheme and other specific policy interventions geared towards financial protection of senior citizens.

How much is your video call worth? Measuring the value of free digital services.

Format: CPS Abstract

Author: Mr Jo Poquiz

The goal of this research is to estimate and examine the value derived by households from the utilization of free digital goods. For this exercise, we estimate the gross value from the consumption of three forms of free digital goods: videoconferencing, personal email, and online news. As our measurement strategy, we employ the prices of "premium" or paid internet goods as proxy for the value from their free counterparts. We also use hedonic regression in order to extract the value of the `free component' of these goods and untangle them from the value of the premium-exclusive components. Our estimates show that in 2020, the aggregate gross value derived by households from the consumption of the three digital services was between GBP 6.1 billion and GBP 22.7 billion. We also observe that the value derived by households from consuming these goods is growing much faster than aggregate household consumption. Our estimate show that in 2020, the initial year of the COVID pandemic, real household final consumption decline would have been 0.07 to 0.13 percentage points slower had the value of the three digital goods been incorporated in the estimates.

Human Environment Index and its Efficacy

Format: CPS Abstract

Author: Prof. Arun Kumar Sinha

Co-Authors:

  • Ashbindu Singh, Environmental Pulse Institute, VA20120, USA
  • Markandey Rai, UN-Habitat, Nairobi, KENYA

The need for the human environment index (HEI) has been felt for a long time because there is no single characteristic that could help measure the existing environmental state of a place quantitatively. There are several natural conditions that influence the environment and no global targets can be set because of the enormous diversity of environmental issues across countries and regions. The construction of the HEI aims to represent the three basic dimensions of the human environment that include green land, blue sky, and clean water by a single number. Being expressed in different units these are transformed into dimension indices using linear transformations, and they are pooled to obtain the HEI. For combining them we use an averaging method. This index, in turn, exhibits the managerial efforts a country/area makes for ensuring the environmental sustainability. The ranking of the countries / areas based on the index produces a sense of competition to improve their investment towards the environmental protection goal, which is the most important factor for our survival and well-being. Further, this index is expected to fulfil an urgent need for a composite index to monitor the environmental characteristics of countries and regions irrespective of their affiliation to the United Nations (UN) in a single number on a regular basis, This attempts to measure the relative progress towards the environmental protection goal. As this measures the relative change its value needs to be interpreted with extra caution. We need to update this index at a regular interval for making it a stronger tool of our environment protection goal just like UN human development index (HDI), which is being computed by the UNDP regularly.

Hypothesis Tests under Finite Gaussian Mixture Regression Models

Format: CPS Abstract

Author: Ms Chong Gan

Co-Authors:

  • Zeny Feng
  • Jiahua Chen

Gaussian mixture model is being increasingly used to cluster the unobserved heterogeneous data. It is common that some covariates are related to the observed outcomes of interests such that they provide valuable information to cluster the response. In order to investigate the covariate effects on the response, we introduced a sequential test within Gaussian mixture regression framework.

IMPLEMENTATION OF THE AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA) METHOD IN FORECASTING THE NUMBER OF VISITS OF FOREIGN TOURISTS TO INDONESIA

Format: CPS Abstract

Author: Mr ahmad risal

Co-Authors:

  • Wawan Saputra

Currently, the role of tourism plays a vital role in the Indonesian economy. In general, the economic impact can be measured by the contribution of the sector's economic value to the national economy's value. Based on BPS data, the contribution of tourism to national output/production reached 4.32 percent and contributed 4.13 percent of the total national GDP. In addition, the role of wages and salaries on the national labor compensation for pepper in 2016 reached 3.86 percent of the national wage, and the tax on the resulting net production contributes to the tax on the actual national net output of 3.84 percent. This data shows the decline in tourist arrivals in Indonesia during the COVID-19 pandemic still tends to be volatile, so it must be monitored and maintained its stability.

Identifying behavioural change mechanisms in epidemic models

Format: CPS Abstract

Author: Prof. Rob Deardon

The COVID-19 pandemic has illustrated both the utility and limitation of using epidemic models for understanding and forecasting disease spread. One of the many difficulties in modelling epidemic spread is that caused by behavioural change in the underlying population. This can be a major issue in public health since, as we have seen during the COVID-19 pandemic, behaviour in the population can change drastically as infection levels vary, both due to government mandates and personal decisions. Such changes in the underlying population result in major changes in transmission dynamics of the disease, making the modelling challenges. However, these issues arise in agriculture and public health, as changes in farming practice are also often observed as disease prevalence changes. We propose a model formulation where time-varying transmission is captured by the level of alarm in the population and specified as a function of the past epidemic trajectory. The model is set in a data-augmented Bayesian framework as epidemic data are often only partially observed, and we can utilize prior information to help with parameter identifiability. We investigate the estimability of the population alarm across a wide range of scenarios, using both parametric functions and non-parametric Gaussian process and splines. The benefit and utility of the proposed approach is illustrated through an application to COVID-19 data from New York City.

Impact of Energy Efficiency on Housing Prices in Türkiye

Format: CPS Abstract

Author: Ms Ozgul Atilgan Ayanoglu

Co-Authors:

  • Ozgul Atilgan Ayanoglu
  • Ezgi Deryol
  • Erdi Kizilkaya

Property price indices, green economy

Impact of Misclassification on Price Statistics: Case Study of Using Machine Learning Classification on Web Scraped Clothing Data

Format: CPS Abstract

Author: Mr Serge Goussev

Co-Authors:

  • William Spackman
  • Brandon Elford Gauthier
  • Yusa Li

While the application of Supervised Machine Learning (ML) to automate the classification of alternative data for official price statistics has been widely demonstrated, the impact of misclassifications on the final statistics has been understudied. To support National Statistical Organizations in understand how to apply Machine Learning to support at-scale production needs, our paper evaluates the impact of misclassifications, ways it can be mitigated, and assesses metrics applicable to tracking the quality of the process once deployed.

Impact of the covid-19 pandemic on the Moroccan economy

Format: CPS Abstract

Author: Nisrine Ghefou

Co-Authors:

  • Nisrine GHEFOU

La crise liée à la propagation du Covid-19 a eu un impact considérable sur l'économie mondiale. En effet, cette crise sanitaire s'est rapidement transformée en une véritable crise économique, appelant les gouvernements à être vigilants mais aussi flexibles afin de s'adapter à ce contexte instable. Au Maroc, l'Etat s'est mobilisé en proposant diverses mesures dès le début de la crise sanitaire afin de minimiser l'impact de cette pandémie et de préserver la stabilité sociale d'une part et le maintien d'une activité économique stable d'autre part, malgré le ralentissement qu'elle a experimenté. Ainsi, cet article cherche à explorer la reprise économique pendant l'état de crise sanitaire COVID-19 au Maroc. A travers cet article, nous verrons l'impact de la pandémie du covid-19 sur l'économie marocaine ainsi que les différentes mesures prises au niveau national qui ont permis d'atténuer l'impact du covid-19.

Implicit pricing of housing characteristics using hedonic price and demand model estimation: The case of the Philippines

Format: CPS Abstract

Author: Mr Christian Mina

Co-Authors:

  • Faith Christian Q. Cacnio
  • Maureen Ane D. Rosellon
  • Aiza Maris C. Valenzuela

A house accounted for roughly 80 percent of the total value of assets of Filipino households based on the 2018 Consumer Finance Survey. Given that it can reflect the wealth holdings of majority of Filipino households, it is important for housing market participants, especially the new entrants, to understand how residential properties are valued. Using the Consumer Finance Survey and Family Income and Expenditure Survey datasets, this study aims to empirically assess the key considerations of households in valuing their houses and determine whether implicit price of each housing characteristics (structural and neighborhood) varies across areas and periods. Estimation of hedonic house price models using both spatial and non-spatial econometric methods will be explored. The second-stage demand models for key neighborhood characteristics will also be estimated using seemingly unrelated regression to provide additional insights on hedonic price analysis. Findings from econometric exercises will be used to craft possible policy interventions that can address housing-related concerns in the Philippines.

Improved Estimation of Parameters of Log-Symmetric Distributions for Achieving Better Fit

Format: CPS Abstract

Author: Prof. Saleha Naghmi Habibullah

Co-Authors:

  • Saleha Naghmi Habibullah

Non-negative data arises in numerous fields. Self-inverse log-symmetric distributions provide an opportunity to construct estimators of distribution parameters that are more efficient than the corresponding well-known moment estimators, leading, in general, to more accurate modeling of non-negative real data. This paper presents a review of a number of self-inversion-based estimators that have been developed during the past ten to twelve years, and presents a new self-inversion-based estimator of the r-th moment about zero, thus adding to the list. A simulation study is presented to demonstrate that the newly developed estimator is more efficient than the corresponding estimator obtained by the ordinary method of moments. The advantageousness of the newly proposed estimator for purposes of model-fitting as compared with the ordinary moment estimator is demonstrated through application to two real data-sets.

Improving The Monitoring Of SDG Indicators Related To The Environment And Agriculture Themes For Sub-Saharan African States.

Format: CPS Abstract

Author: Miss Edvira MALLIEDJE FOKAM

Co-Authors:

  • Edvira MALLIEDJE FOKAM

There is still a lack of updated and reliable environmental and agricultural statistics in the world, even though they cover 36.6% of SDG indicators used by the United Nations to date. It is with this in mind that the idea of developing this guide was considered, with a view to helping sub-Saharan African States, in this case AFRISTAT members, to make progress in monitoring the SDG indicators related to these themes. The state of data availability revealed a significant gap in the production of environment and agriculture-related SDG monitoring indicators for both international and national sources. However, it was found that the indicators available from national sources are mainly from administrative sources. This further illustrates the need to strengthen survey data collection mechanisms for monitoring the environmental indicators of the SDGs. In fact, the methodological notes developed in this document show that a simple readjustment of the survey questionnaires already in place in these countries could improve the collection of some indicators, such as indicator 2.3.1. It also emerges from this methodological work that greater collaboration between national statistical offices and the private or industrial sector could have a positive impact on the monitoring of the environmental indicators of the SDGs in these countries, such as indicator 6.3.1.

Improving the coherence of estimates from different survey using replicated sampling

Format: CPS Abstract

Author: Mr Adhi Kurniawan

Co-Authors:

  • Adhi Kurniawan
  • Atika Nashirah Hasyyati

Sampling method plays an important role in determining the quality of statistical data, including the coherence of survey data estimates from different sources. Sometimes, there are several surveys that produce the same indicator, but the sampling procedure of these several surveys are often conducted independently. By using independent sampling procedure, it is very possible that the data resulted will be less coherent due to the effect of different sampling methods. Therefore, the sampling method applied for these surveys needs to be developed and integrated. In this paper, the implementation of replicated sampling is used. Some different sampling scenarios are simulated. This simulation study has two main objectives: (1) to examine the correlation between the number of overlapping units in the two surveys and the coherence of survey data; (2) to compare the data accuracy of this alternative method against independent sampling method. Data coherence is approximated by the difference between the estimate of the same indicator produced by the two surveys, while the accuracy is measured by the magnitude of the sampling variance and bias. The simulation results show that the replicated sampling method has better level of coherence and better level of accuracy than independent sampling methods.

Imputing zeros in business survey items using a binary classification method

Format: CPS Abstract

Author: Mr Ichiro Murata

Imputation has been an important topic of research for the National Statistics Center of Japan. Here we are interested in some characteristics of business survey items and utilizing it to improve the existing imputation method.

Income Diversification and Bank Stability in Indonesia: Does Market Volatility Matters?

Format: CPS Abstract

Author: Mrs Fitri Handayani

The 2008 global financial crisis highlighted the importance of financial stability because of its vital role in mobilizing financial resources for the real economic sector. Therefore, this study needs to focus on bank stability in the post-pandemic crisis era. This condition is more severe than the global financial crisis and still leaves a scarring effect on the world economy. A risk of a global recession in 2023 also emerged after the crisis. Currently, banks are not only carrying out traditional activities but are also expanding their business into non-interest activities known as diversification. Some of the previous research showed that diversification positively affects bank stability because it can reduce risk by placing its finances in various assets. However, during the 2008 global financial crisis, banks with high diversification tended to collapse. Hence, other research showed that high diversification did not guarantee high stability for a bank. The inconclusive debate regarding the direction of the relationship between diversification and banking stability presents a research gap that will be examined in this study for cases in Indonesia. Moreover, empirical research on bank diversification and stability is also still limited. In addition, involving market volatility in the relationship between the two is a novelty in this study. Here I investigate the effect of income diversification and diversification-market volatility interactions on bank stability.

Income inequality in Latin America: recent trends and measurement challenges

Format: CPS Abstract

Author: Xavier Mancero

Abstract for Invited Paper Session: "Impact of global challenges on income inequality in 2020-2023" ECLAC regularly measures income inequality in Latin American countries, using data from its household survey data bank. In recent years, it has sought to promote the use of a combination of household surveys, administrative records, and national accounts in order to have a more complete measurement of inequality. This paper presents recent trends in inequality in the region, based on survey data, and progress in collaboration with countries for a more comprehensive measurement of inequality.

Industry 4.0 – A Tool to Transform a Standard Factory into a Smart Factory

Format: CPS Abstract

Author: Mr Sushanta Kumer Paul

Industry 4.0 is a contemporary subject that concerns today’s industrial production. It enables the manufacturing sector to become digitalize with built-in sensing devices virtually in all manufacturing components, products and equipment. Industry 4.0 serves a role to help integrate and combine the intelligent machines, human actors, physical objects, manufacturing lines and processes across organizational stages to build new types of technical data, systematic and high agility value chains.

Inference on missing locations in Geostatistics under Preferential Sampling

Format: CPS Abstract

Author: Prof. Gustavo Ferreira

Co-Authors:

  • Dani Gamerman

This paper deals with the inverse problem in Geostatistics in situations where the researcher performs inference about missing locations under a specific type of informative sampling design. Based on the traditional Geostatistical model, one can found in the literature methods for drawing samples from the resulting predictive distribution of the missing locations considering that the underlying stochastic process of interest is independent of the sampling design, i.e., assuming the sampling design is not preferential. However, in cases where the sampling design is preferential it is necessary to specify a joint distribution for both the stochastic process and the sampling design and a common choice is to assume log-Gaussian Cox processes and stationary Gaussian process to deal with this situation. Unfortunately, under preferential sampling, deriving a straightforward way to obtain the predictive distribution of the missing locations is a complex task. In this paper we present a methodology to obtain the predictive distribution of missing locations under preferential sampling.

Inference on unit roots and cointegration in high-dimensional or functional time series

Format: CPS Abstract

Author: Dr Won-Ki Seo

Co-Authors:

  • Morten Nielsen
  • Dakyung Seong

This paper concerns statistical inference on unit roots and cointegration for time series taking values in a Hilbert space of an arbitrarily large and possibly infinite and/or unknown dimension. When such a time series is given, an important first step is to estimate the number of stochastic trends, which indicates how many linearly independent unit root processes are embedded in the time series. We develop statistical inference on the number of stochastic trends that remains asymptotically valid even when the time series of interest takes values in a Hilbert space of an arbitrary and indefinite dimension. This has wide applicability in practice; for example, in the case of cointegrated vector time series of finite dimension, in a high-dimensional factor model that includes a finite number of nonstationary factors, in the case of cointegrated curve-valued (or function-valued) time series, and nonstationary dynamic functional factor models.

Initiatives to Measure the Size of Gig Workers and Online-related Occupations in the Philippines Using the Labor Force Survey

Format: CPS Abstract

Author: Minerva Eloisa Esquivias

Co-Authors:

  • Wilma A. Guillen
  • Severa B. De Costo
  • Mechelle M. Viernes
  • Emerson M. Aquino
  • Rassel Jhun S. Embile

In recent years, “gig workers” and online-related occupations began to grow due to the recent growth of technology-based businesses and the availability of freelance and part-time work. They also emerged during the onset of the COVID-19 pandemic because of the flexibility of work arrangements. The workers who lost their full-time jobs began undertaking temporary or freelance jobs to sustain or afford their needs. As the economy starts to embrace the new normal for workplaces, information about the nature and magnitude of gig work characterized by short-term employment and workers for different employers on a day-to-day or week-to-week basis is becoming vital for labor market policy and regulatory purposes.However, while there were concerted efforts from policymakers and researchers to provide information on the nature of the gig economy, to our knowledge, conceptual or operational definitions and identification of occupational classification of gig workers vary across countries. Hence, the countrywide estimation of the size of the gig workers was incomparable. In response to the call for relevant statistical information on the size of gig workers in the Philippines, we developed criteria for gig workers based on the nature of employment available in the Labor Force Survey (LFS). With the notion that gig workers were most likely engaged in digital labor platforms, we also supplemented questions related to the online-related occupations in the May and June 2021 rounds of the LFS to capture the gig workers engaged in digital labor platforms. The average results of the May and June 2021 rounds of the LFS showed that 20 percent (8.7 million employed persons) of the 44.4 million employed labor force were engaged in an online platform and mobile application in the Philippines. Meanwhile, out of the total employed persons on average, 22 percent (9.9 million employed persons) were gig workers or workers in short-term/seasonal or worked for different employers on a day-to-day or week-to-week basis. Notably, of the total 9.9 million gig workers, 68 percent (6.7 million persons) were employed in private establishments, 22 percent (2.1 million) were self-employed, 6 percent (584 thousand persons) worked for private households, and 5 percent (463 thousand persons) worked for the government. In terms of engagement in digital labor platforms, of the total, 9.9 million gig workers, 17 percent (1.7 million gig workers) were engaged in online platforms or mobile applications in their work. However, it is to be noted that the share of gig workers engaged in digital labor platforms in their work to the total employed persons was lower with four (4) percent share of the total employed persons in the Philippines. This paper defined the scope and coverage of the universe under study and developed criteria to measure the size of gig workers and gig workers engaged in digital labor platforms. Furthermore, we investigated the nature of gig workers in the Philippines by looking at their demographic and socio-economic characteristics.

Innovation in the Czech Statistical Office

Format: CPS Abstract

Author: Mrs Petra Kuncová

Co-Authors:

  • Petra Kuncová

The article will present innovations in the Czech Statistical Office - both those that have already been implemented and those that are currently in the implementation phase or in the planning phase. These are activities both in the statistical field and in the provision of support and control processes.

Integrating climate change for a mortgage portfolio: A case of South Africa

Format: CPS Abstract

Author: Ms Dorothy Lesego Sepato

Co-Authors:

  • Dorothy Lesego Sepato
  • Kanshukan Rajaratnam

Climate change reacts differently in financial markets, the major one being how climate risk interacts with macroeconomic variables and understanding the economic impacts of climate change on credit risk within banks. A major problem that banks are currently facing is identifying suitable ways to integrate climate risks in their credit risk management. As such, this study seeks to provide an overview of the problems caused by climate change in financial institutions on the mortgage loan portfolio and presents data on South Africa's exposure to the flood risks (Feyen et al., 2020). According to Ouazad &amp; Kahn (2020) real estate will likely face mounting risks in a world facing ambiguous climate change, however little is known about the way banks reacts to the mortgage industry.

Integration of a non-probability sample under the not missing at random assumption

Format: CPS Abstract

Author: Ms Ieva Burakauskaite

Co-Authors:

  • Andrius Čiginas

We consider the integration of the non-probability sample into the Statistical survey on population by ethnicity, native language and religion of the Lithuanian census 2021. The propensity score adjustment is used to correct the bias of estimates of parameters based on the non-probability sample. We apply a parametric model to describe the response mechanism that is assumed to be not missing at random. Calibration and maximum likelihood based methods are used to estimate the propensity scores. We compare the results with those obtained under the missing at random assumption in a simulation study carried out using census data.

Interrelationships between credit risk management and financial performance of Microfinance Institutions in Uganda

Format: CPS Abstract

Author: Mr Bob Barugahare

Co-Authors:

  • Bob Barugahare

Financial performance in financial institutions continue to attract attention of scholars and policy-makers due to the long reputable role they play towards economic growth and poverty alleviation. A study was conducted to examine the interrelationships between credit risk management and the financial performance of Microfinance Institutions (MFIs) in Uganda. Specifically, the study assessed how credit risk assessment, estimation and risk appraisal influences the financial performance of Microfinance Institutions in Uganda. The study adopted a cross-sectional survey design and primary data was collected from 32 microfinance institutions in Uganda in December 2021. The study obtained responses from 224 staff of MFIs using a questionnaire tool. Data was analyzed using SPSS software in which descriptive statistics, correlations and multiple linear regression results were produced. The study findings revealed a statistically significant positive relationship between credit risk management and financial performance (r=.740, p&lt;.01). In particular, there was a strong positive relationship between credit risk assessment and financial performance (r=.669, p&lt;.01). Credit risk estimation had a strong positive correlation with financial performance (r=.660, p&lt;.01). There was a strong positive correlation between risk appraisal and financial performance (r=.755, p&lt;.01). The regression analysis showed that credit risk management is a significant predictor of financial performance among MFIs (β=.562, t=10.138, p&lt;.05). In conclusion, it is clear from the study findings that the financial performance of MFIs highly depends on its credit risk management practices such as risk estimation, credit assessment, credit risk control among others. There is therefore a need for microfinance institutions to embark on credit risk management in order to reduce on the risk of default and non-performing loans. This will involve loan assessments, controls, loan approvals, credit rating and borrower evaluation. Lastly, a pre-disbursement training is recommended for all successful loan applicants.

Introducing ISENSE: An Index of Sensitivity to Non-exchangeability

Format: CPS Abstract

Author: Mr Md Rashedul Hoque

Co-Authors:

  • Yi Qian
  • J Antonio Aviña-Zubieta
  • Mary A De Vera
  • Lawrence McCandless
  • Hui Xie

The exchangeability assumption might not hold if some important confounders are unmeasured or if reverse causality is present in any observational study. Existing sensitivity analysis methods for non-exchangeability often require additional untestable assumptions. We propose an index of sensitivity to non-exchangeability (ISENSE) to measure the impact of non-exchangeability on treatment effect estimates that does not require imposing any such assumptions and can handle both unmeasured confounders and reverse causality.

Introducing two different approaches for estimating working hours obtained from The Survey on Time Use and Leisure Activities

Format: CPS Abstract

Author: Mr Shinichi Nagao

This study aims to compare the results of two estimates of working hours obtained from the 2016 Time Use Survey. Japan’s Time Use Survey, which shows the types of activities people spend their time on, divides each day into 96 slots of 15 minutes each. Using the percentage of people participating in each activity helps calculate the percentage of people who are working during each time period of the day, so the number of hours and time periods people spend working each day can be ascertained by occupation. Furthermore, the amount of time that participants spend working each day and by aggregating data for each day of the week, the number of hours worked per week can be estimated. Adopting an approach that is different from this estimate of work hours, hours worked can be calculated from “usual working hours per week” obtained from the questionnaire. A comparison of estimated work hours from these two approaches was roughly consistent with work hours by occupation.  In the first method, drawing figures that plot the participation rate of people who work (with the X-axis as the time of day from 0:00 to 24:00 hours and the Y-axis as the participation rate by working persons), is that for office workers who are full-time employees, the participation rate, which is the proportion of people who are working, generally rises starting from 8:00 and falls after 18:00. In addition, the area covered by this graph provides an estimate of the hours worked per day, so the hours worked per week can be estimated by aggregating the hours worked on weekdays (Monday to Friday), Saturdays, and Sundays.  In the second method, Japan’s Time Use Survey asks questions about “usual working hours per week.” This estimates each person’s work hours by giving seven time periods per week, excluding “not fixed.” When estimating average work hours by occupation, the median time period entered in the individual’s questionnaire is used to calculate their work hours per week.  Although the most common method of ascertaining work hours is to use the monthly Labor Force Survey, work hours calculated from the two estimation methods in the Time Use Survey are consistent with the characteristics of work hours by occupation obtained from the results of the Labor Force Survey. These methods are useful to ascertain the characteristics of the work hours and how much time is spent working on each day of the week, including time periods and other more detailed information.

Introduction to Statistics Education Programs at Statistics Korea

Format: CPS Abstract

Author: Ms EUNJIN AN

Developing statistical literacy can support rational decision-making in complex and uncertain situations. In addition, statistical education for future generations of students has great significance in expanding the base of national statistics. Statistics Korea is actively conducting statistics for school students through the Statistics Training Institute. Various statistical education programs for students are being operated to cultivate statistical thinking in daily life and to improve problem-solving ability. The following introduces the statistics education program of the Statistics Korea, focusing on educational contents development, student and teacher education, and the National Statistics Poster Contest. Project-type statistical classes using SGIS (Statistical Geographic Information Service) and KOSIS (Korean Statistical Information Service) will also be explained. Through this, we intend to share the 30 years of school statistics education experience of the Statistics Korea.

JDemetra+ 3.0: a versatile time series analysis software.

Format: CPS Abstract

Author: Mrs anna smyk

Co-Authors:

  • Tanguy Barthelemy
  • anna smyk

LAD-LASSO-RF AND LAD-LASSO-RIDGE FOR HIGH-DIMENSIONAL REGRESSION MODELLING

Format: CPS Abstract

Author: Dr ADEWALE LUKMAN

Co-Authors:

  • Onifade Femi

The high-dimensional problem arises when there are more features (variables) than the sample size (n) in a dataset. Examples occur in computational chemistry, pharmaceutical chemistry, Chemometrics with spectral data, and genomics, to name but a few. OLSE does not produce a unique estimate in high dimensional settings (p&gt;n) because the predictors do not have full rank. Also, OLSE produced high variance when there is linear dependency (multicollinearity) among the predictors [1-3]. The ridge estimator and other shrinkage estimators have been developed as alternatives to the method of least squares [1-6]. The ridge estimator is stable and achieves better prediction by producing a minimum mean squared error but fails to shrink its coefficient to zero [7]. Furthermore, it is important to stress that the well‐known OLSE and ridge estimator is sensitive (not resistant) to heavy‐tailed distribution error or outliers. Heavy-tailed errors or outliers are often encountered in applications. This might occur in the response variable or among the predictors. The Least absolute deviation estimator (LAD) produced more robust estimates than the OLSE and ridge estimator when the heavy-tailed errors or outliers appear in the response variable but fail to perform variable selection [8-9]. Variable selection is a crucial aspect of high-dimensional regression modelling. It is worthwhile to state that removing important predictors produces severely biased regression estimates and prediction results. The problem of variable selection has been studied extensively in the literature [7, 9-15]. A well-known variable selection method is the Least Absolute Penalty and Selection Operator (LASSO) [10] which constrained the residual sum of squared (RSS) to the L1-norm penalty. LASSO possesses both shrinkage and automatic variable selection features. The drawback is that LASSO tends to choose only one variable from the group of pair-wise correlated predictors. It doesn’t care about which one is selected [1, 7]. The Elastic Net (E-net) [7] was developed by combining the L1 and L2 norms penalty. E-net is an improved LASSO that performs both regularization and variable selection and tends to select more variables than LASSO. All the aforementioned variable selection methods are sensitive to heavy-tailed errors or outliers. LAD-LASSO was proposed because the OLSE criterion used in LASSO is highly sensitive to outliers. Hence, the OLSE criterion was replaced with the LAD criterion to produce the LAD-LASSO. The overall aim of this study is to develop novel methods for the analysis of high-dimensional models (QSAR studies). Problems in the model include multicollinearity among predictors and outliers in the response variable. This will be achieved by pursuing the following objectives: (1) We will critically evaluate existing methods for dealing with high dimensional models with either multicollinearity problems or outliers or both in the statistical literature. (2) We will develop robust methods that will perform variable selection, and account for multicollinearity and outliers with a focus on prediction. To achieve this, we will combine the usual LAD-LASSO with either the random forest regression or the ridge regression. (3) The estimators’ performance will be compared with some existing ones through simulation studies and real-life applications.

Labour share in Moroccan economy

Format: CPS Abstract

Author: Mrs BAHIJA NALI

Co-Authors:

  • YATTOU AIT KHELLOU

The measurement of the share allocated to the labour factor is underestimated in most countries, especially in developing countries, due to not considering the labour income of the self-employed. This article present the valuation of the labour income share, and propose a methodology to adjust the share of national income which is devolved to employees. The approach we are presenting is based on a local reality characterized by the predominance of self-employment, whether in the agricultural sector (87%) or in the informal sector (76%).

Le système d'Information statistique sur la migration

Format: CPS Abstract

Author: Madam hanane houchimi

Co-Authors:

  • hanane houchimi

Les données statistiques sur la migration sont indispensables pour garantir la communication, la régularité, le suivi et l’évaluation des  politiques migratoires du pays. Afin qu’elles soient utiles et indispensables pour l’élaboration des différentes stratégies de développement, produire davantage de données statistiques n’est guère suffisant, c’est le faite de garantir la mise à disposition de ces données quantitatives et qualitatives fiables, actualisées et suffisamment désagréger pour garantir le suivi et l’évaluation des différentes politiques et stratégies.  Pour des raisons d’efficience, les politiques migratoires doivent être fondées sur des dépositions solides, des informations statistiques complètes, présentant ainsi les profils démographiques, économiques sociaux, culturels, des migrants. Le Maroc a entrepris des efforts considérables en vue de renforcer les capacités statistiques du système de collecte de données sur la migration, néanmoins, la conception et la mise en œuvre d’approches exhaustives en matière de développement des capacités statistiques s’imposent toujours. Plusieurs efforts d’investissements ont été entrepris en matière de ressources humaines, en termes de capacités techniques, aux nouvelles technologies d’information et aux plateformes et aux outils technologiques. Ces efforts avaient pour objectif la mise à niveau des capacités des institutions responsables de collectes de l’information statistiques dans le Royaume à produire des données nationales    sur la migration.

Leveraging Textual Data in Nowcasting Malaysia’s Gross Domestic Product

Format: CPS Abstract

Author: Ms Veronica S Jamilat

Co-Authors:

  • Faiza Rusrianti Tajul Arus
  • Fatin Ezzati Mohd Aris

Generally, nowcasting economic indicators such as Gross Domestic Product (GDP) is often making use of structured data including real, financial and survey indicators. Recent research focused on how textual data from news and social media have been utilized in improving nowcasting models. There are two methods used to perform text analysis which are machine learning methods and lexicon-based methods. In this paper, we consider media content as an additional data source to the recent work of nowcasting Malaysia’ GDP using Machine Learning as a continuous effort in improving the nowcast accuracy. This paper performs both machine learning and Lexicon-based sentiment analysis to capture news sentiment on economic activities which to be included in the GDP nowcast model. This paper is the expansion of the current work of GDP nowcast. https://github.com/DSctadr/Nowcasting-Malaysia-GDP

Leveraging microsimulation models for public health policy decision making

Format: CPS Abstract

Author: Charlotte Probst

Co-Authors:

  • Charlotte Probst

The session will present and discuss a microsimulation approach to public health policy decision making on the example of alcohol control intervention scenarios to further health equity. Since about 2010, life expectancy at birth in the United States (US) has stagnated and begun to decline with concurrent increases in the socioeconomic divide in life expectancy. The Simulation of Alcohol Control Policies for Health Equity (SIMAH) project uses a novel microsimulation approach to investigate the extent to which alcohol use, socioeconomic status, and race/ethnicity contribute to unequal developments in US life expectancy and how alcohol control interventions could reduce such inequalities. The microsimulation model allows for the systematic integration of disparate data sources into one coherent simulation model, replacing a static, one-factor-at-a-time approach to policy modelling.

Leveraging on Job Vacancies Advertised Online to Analyse Malaysian Labour Market Using Big Data Analytics

Format: CPS Abstract

Author: RABI'ATUL'ADAWIAH SHABLI

Co-Authors:

  • RABI'ATUL'ADAWIAH SHABLI

Big data analytics are now a common tool being used by economists and statisticians to address socioeconomic challenges as well as to complement existing labour market information in the country. Based on information retrieved from big data, there are possibilities to obtain huge amounts of multidimensional, diversified and granular data. As a consequence of the rapid usage of the internet, recruiting processes have switched to online applications and as a result, provide accessibility to vacancies data. The aim of this paper is to examine data gathered from job vacancies advertised online for the use of research in labour economics and skills development. Although extracted real-time, the processed data is available on a quarterly basis. In this paper, the information is analysed using several methods, namely descriptive analysis and time-series analysis. In addition, this paper assesses the main characteristics of the labour market data based on scraped vacancies data starting from the first quarter of 2020.

Logistic Regression Analysis on the Characteristics of Post-Pandemic Unemployment in Bintan Regency

Format: CPS Abstract

Author: Mr Dio Dwi Saputra

Employment is still the most frequently discussed topic and the indicator most frequently discussed is unemployment. Based on Statistics Indonesia concept, the unemployment which consist of person without work but looking for work, person without work who have established a new business/firm, person without work who were not looking for work, because they do not expect to find work and person who have made arrangements to start work on a date subsequent to the reference period (future starts). And from data released by Statistics Indonesia, the Bintan Regency has a high unemployment rate in 2021. Therefore, this study examines the characteristics of the unemployed in Bintan Regency after the pandemic, when economic conditions should have improved.

Long-term unemployment in Morocco: Profils and determinants

Format: CPS Abstract

Author: Mr Oussama RIDA

Co-Authors:

  • kamel Gaanoun
  • naima Labroude

Long-term unemployment refers to people who have been unemployed for more than 12 months. And it is one of the ILO's indicators (KILM11). During the last two decades, long-term unemployment has exceeded 60% in Morocco. It is therefore a mass phenomenon with harmful consequences. What is the distribution of long-term unemployment? What are the most vulnerable profiles?

MODELING THE PROGRESSION OF NEONATAL HYPOTHERMIA DISEASE PROGRESSION USING DATA ON NEW BORN AT DILLA UNIVERSITY REFERRAL HOSPITAL APPLICATION OF MULT

Format: CPS Abstract

Author: Selamawit Moja

Co-Authors:

  • Selamawit Moja
  • MELASHU SIMEGNEW

The World Health Organization defines hypothermia as a core temperature of 35 degrees Celsiusor below in preterm newborns hypothermia can be caused by the environment or by anunderlying illness (eg sepsis), Preventing newborn hypothermia requires maintaining anacceptable ambient temperature in the delivery area or operating room. Infants that arehypothermic should be rewarmed, and any underlying conditions should be identified andtreated Balest 2021 (Arcangela Lattari).Sustained body temperature decrease the metabolic demands of the newborn and has beenassociated to Sepsis, Asphyxia, Respiratory Distress Syndrome, and death(Demtse et al., 2020,(Copas &amp; Malley, 2008). When heat loss exceeds the baby's ability to produce heat, its bodytemperature falls below the normal range (36.5°C – 37.5°C). Sudden changes in the bodytemperature of newborns during delivery, particularly in the absence of appropriate preventivemeasures, can result in neonatal hypothermia. With a temperature of 36.0°C – 36.4°C, thenewborns are experiencing cold stress (mild hypothermia), which should be cause for concern. Abody temperature of 32.0 – 35.9°C is considered moderate hypothermia. A newborn with atemperature less than 32.0°C is considered to have severe hypothermia and should receive skilledcare as soon as possible.The World Health Organization recommends a "warm chain," or a series of interrelatedprocedures, to reduce the risk of newborn hypothermia (World Health Organization., 1997).Infant rewarming after birth should be optimized and time should be allowed for it, in order tolimit the presence of hypothermia after birth even before undergoing procedures at the NeonatalIntensive Care Unit (Dubbink-Verheij et al., 2021). Hypothermia prevalence rates in hospitals ranged from 32 percent to 85 percent, with rates fluctuating even in tropical contexts(Lunze et al., 2014). Ethiopia had a prevalence of hypothermia of 69.8% after birth (Demissie, B.W., Abera, B.B., Chichiabellu, T.Y., Astawesegn, 2018).A greater understanding of disease dynamics, such as the time it will take to reach a specificcondition or the likelihood of movement, could lead to more effective prevention, management,and therapy. By employing a Markov model to understand the transition of neonatal hypothermiastates, medical care providers can be alerted to the need for treatment when the neonatalhypothermia state changes. In this model, the transition intensities represent the dangers andsurvival probability for transitioning between hypothermic states(Spruance et al., 2004).No studies from the study area have used the rates of immunological marker of bodytemperature change among infected neonates (infants) patients in a Hidden Markov Model toexamine hypothermia disease progression. As a result, at Dilla University Referral Hospital inEthiopia, we conducted a retrospective cohort study to analyze the dynamic progression ofhypothermia disease progression among neonates infected patients after beginning of bodytemperature(KASHIHALWA, 2019 &amp; KUSHA A MOHAMMADI, BA, 2020)

Machine Learning : Factor Analysis of Food Security using Big Data in Indonesia

Format: CPS Abstract

Author: Miss Atika Kautsar Ilafi

Co-Authors:

  • Lita Jowanti
  • Annisa Nur Fadhilah

The issue of a global economic recession that will occur in 2023 is getting stronger. This is starting to be seen from the start of the phenomenon of high inflation in various countries which caused central banks in several countries to raise interest rates, including Indonesia. The threat of an economic recession is inseparable from the risk of food insecurity and even the food crisis that has hit various countries. This food insecurity is caused by the price of food, energy and fertilizer as a result of the prolonged conflict between Russia and Ukraine. Limited data in creating measures of food insecurity makes it difficult to determine policies that can be taken by the state in overcoming this food insecurity, therefore it is necessary to conduct research on the available big data. Several studies have conducted trials measuring food insecurity from regional weather and climate variables, availability of agricultural land, satellite imagery, night light data, food price increases, and population density. This study aims to predict food insecurity to these variables with the Indonesian as unit of analysis using the machine learning factor analysis method.

Machine Learning Methods for Assessing the Consistency and Integration of Statistical and Administrative Data

Format: CPS Abstract

Author: PROF. DR. Elena Zarova

Based on the generalization of the methods proposed in publications for integrating administrative data into the practice of national statistical offices, as well as taking into account the results of scientific developments in this direction, a set of methods for assessing the consistency of official statistics and administrative data has been developed (using the example of wage statistics). On the basis of a complex of "traditional" statistical methods and machine learning methods, the necessary components for ensuring consistency are identified. A set of statistical and machine learning methods for assessing the consistency of administrative and statistical data used in official practice at the regional level has been tested on real wage data. The proposed methods for integrating administrative and statistical data at the regional level are relevant for the statistical practice of various countries.

Machine learning for coding occupations in the Census: lessons from experiment to production

Format: CPS Abstract

Author: Mr Lucas Malherbe

Co-Authors:

  • Lucas Malherbe
  • Elise Coudin
  • Tom Seimandi
  • Théo Leroy

This paper presents the approach undertaken by INSEE to select and implement classification of the occupational variables of the annual census survey in the new national occupational classification (PCS 2020). The coding process will use a combination of automatic approaches (list auto-completion and supervised ML prediction models) and manual coding. An ad hoc annotation campaign conducted in 2021 provides a first set of training and testing of the algorithms. A two-layer neural network algorithm (fastText embeddings of words and n-grams and classifier) allows to achieve overall accuracy goals fixed as conditions for going into production.

Macroscopic properties of large equity markets: Stylized facts and portfolio performance

Format: CPS Abstract

Author: Mr Steven Campbell

Co-Authors:

  • Ting-Kam Leonard Wong
  • Qien Song

This work presents a systematic investigation into the macroscopic properties of large equity markets. While empirical features of stock prices and their cross-sectional behavior have been studied extensively in financial econometrics, relatively little attention has been paid to macroscopic properties like the capital distribution curve. In addition to addressing this gap, our study reveals new statistical properties of fundamental objects in Stochastic Portfolio Theory like market diversity.

Measurement and Spatial Dependence of Food Price Level among Chinese Cities

Format: CPS Abstract

Author: Dr Yan Wang

Co-Authors:

  • Mo Yang

Based on the two-step measurement framework of the World Bank International Comparison Program, this paper uses the price data of 106 homogeneous and comparable products in 29 cities collected from the e-commerce platform to calculate the food price level at the city level in China. The spatial distribution characteristics, interdependence patterns and influencing factors of the food price level have been studied in depth. It is found that there are significant differences in the overall food price level between Chinese cities. Among them, the city with the highest food price level is Shenzhen, and the lowest is Changchun; the price levels of fresh food vary greatly between cities, while the price level of packaged food vary much smaller between cities. During 2001-2019, the food price levels between cities show a trend of convergence. Studies based on the spatial perspective show that food price level presents a certain effect of urban agglomeration. Compared with geographic spatial correlation, food price levels between cities show a stronger economic spatial correlation. Food price level of a city is largely determined by its level of economic development as well as affected by the spatial spillover of cities with similar development levels. The conclusions indicate that in order to effectively improve the accuracy of policy making related to minimum wage standard, basic living allowance, and income distribution et al., regional price level differences should be taken into consideration.

Measures of Bank Competition and Bank Risk-Taking by Dr. Veronica B. Bayangos (vbayangos@bsp.gov.ph)

Format: CPS Abstract

Author: Dr Veronica Bayangos

Since the 2000s, reforms have greatly reshaped the structure of the global financial system. Some banks have become big and interconnected while some have become risk takers. Studies suggest that financial sector reforms promote bank competition in most advanced economies. As such, discussions on bank competition have intensified in recent years particularly in constructing different measures of bank competition and in explaining their relevance in driving financial stability. However, some studies also find that bank competition in many emerging countries have declined despite the implementation of financial sector reforms. Crucially, the array of empirical studies has highlighted the influence of bank competition on financial stability, credit growth, and the regulatory drivers of competition in banking markets (De-Ramon and Straughan 2020). This study attempts to contribute on research related to the role of bank competition on bank risk-taking by examining two competing views. In the traditional “competition-fragility” view, Jimenez et al. (2013) explain that increased bank competition could threaten the solvency of individual banks. This could erode the franchise value of a bank which could encourage a bank to pursue riskier policies to maintain its profits. These riskier policies could lead to higher non-performing loan ratios and potentially lead to bank failures. The “competition-stability” view posits that a less intensive competition may result in higher lending rates, which may in turn raise the credit risk of borrowers due to moral hazard issues. The increased default risk could drive more problem loans and greater bank instability. However, such a situation allows a bank to protect its franchise value by pursuing safer policies that contribute to the stability of individual banks (Boyd and De Nicolo 2005). Since the Philippines does not have official measures of bank competition, the approach is to first construct measures of bank competition based on market power from a unique dataset of balance sheet and income statements for 542 banks operating in the Philippines from March 2010 to December 2020. These measures include the H-Statistic, Lerner Index, and the Boone Indicator. The paper then estimates the impact of these competition measures on solvency risk or the risk of being unable to absorb losses with the available capital across universal bank, thrift bank, and rural/cooperative bank industries. Using panel quantile regression, the results reveal that, at the industry level, bank competition reduces solvency risk. Specifically, the Boone Indicator which measures efficiency, has the biggest impact on solvency risk among the measures of bank competition. Looking at the risk distribution, the study shows the presence of the competition-fragility and competition-stability hypotheses holding simultaneously for universal banks suggesting that the effect of competition depends crucially on the underlying individual bank risk. Equally importantly, the results highlight that the relationship between competition and bank risk is sensitive to factors related to extent of diversification strategy, cost-to-income ratio, deposit growth, capitalization, changes in the physical banking networks, and growth of real Gross Domestic Product.

Measuring Global Flow of Funds and Data Science

Format: CPS Abstract

Author: Prof. Nan Zhang

The global flow of funds (GFF) connects domestic economies with the rest of the world. GFF data can provide valuable information for analyzing interconnectedness across borders and global financial interdependencies. Corresponding to the deregulation of the financial market, International organizations such as IMF, government agencies, central banks, and researchers began exploring the GFF in the 2010s. In the 64th ISI World Statistics Congress, we’ll discuss the theory, method, and application of GFF data and its analysis, it integrates economic statistics, data science, and financial network to demonstrate their interconnectedness. It provides the groundwork for understanding the workings of globalized financial markets. Adopting the Balance of Payments, International Investment Position, and International banking statistics as a framework for measuring GFF, identifies financial links among economic sectors and the rest of the world. Its integrated sources include data deconstructed by country/region for selected financial instruments, constructs a GFF matrix (metadata) on a from-whom-to-whom basis by country, and uses that matrix for empirical study via data science and financial network analysis.

Measuring and monitoring women’s empowerment and child nutrition in Egypt: Implication with a core set of indicators in the SDGs era

Format: CPS Abstract

Author: Dr Reem ElSybaey

Despite the progress made in demographic and health indicators, achieving the Sustainable Development Goals by 2030 remains a severe task. This paper serves in monitoring several women and child indicators in the era of sustainable development goals.

Measuring construction activities with text data

Format: CPS Abstract

Author: Dr PIETER VLAG

Part of the European ESS-WIN projekt is to explore the potential of new types of web data sources for official statistics. One use case deals about the use of scraped data from real estate web portals for construction activities. Construction activities have a large impact on construction industries and the availability of housing. The traditional official statistics provide information on construction activities in terms of building permits for construction of new buildings, building permits for construction with prefabricated elements and construction work completed. Real estate web portals can provide information about construction activities with higher spatial accuracy and sometimes already in earlier stages of the construction planning. This case study, which is carried out in Germay and Sweden, focusses on real-time observation of trends. It includes a method for early estimates of construction activities with high geospatial resolution and developing a price indicators for new buildings.The presentation presents the results obtained in Sweden. These are based on data of the two largest real-estate portals in this country. The presentation will focuss on four points: 1) data-access: scraping from websites or getting access to the data via APIs and partnerships with the portal owners, 2) representativity 3) an early trend indicator about demand prices for new buildings, 4) relationships with traditional statistical information about construction activities and the use of sattelite images to detect constructions.

Measuring multispecies aggregation level by a conspecific-encounter index using line transect data

Format: CPS Abstract

Author: PROF. DR. Tsung-Jen Shen

Co-Authors:

  • Youhua Chen
  • Hoang Van Chung
  • Shengchao Shi
  • Jianping Jiang
  • Richard Condit
  • Stephen P. Hubbell

It is common knowledge that some specific species in an ecological community present aggregate distributions, but this does not necessarily imply that the community as a whole presents an aggregate distribution. Using the conspecific-encounter index derived from the Markov non-independent sampling model, this talk will introduce a legible definition of community-level distributional aggregation as an interspersed or cluster-like distribution of different species. In practical applications, by utilizing the conspecific-encounter index that accounts for the non-independent sampling of consecutive individuals along line transects, the result reveals that tree assemblages in tropical forest ecosystems can present a strong signal of extensive distributional interspersion. By contrast, for the amphibian assemblages, the conspecific-encounter index was consistently high, implying that amphibian communities tend to be highly aggregate in space.

Measuring resident households’ consumption abroad using payment card transactions data

Format: CPS Abstract

Author: Dr Klaudia Máténé Bella

Co-Authors:

  • Beáta Horváth

In 2020, because of the pandemic Covid19, new data source was involved in estimation of household spending abroad, namely payment card transactions data These data are available quarterly and include the channel of acceptance and the Merchant Category Codes (MCCs) as well. We created a correspondence table between MCC and COICOP in order to analyse spending data based on the type of products and services. We found that the Hungarian spending abroad covers several types of goods and services.

Measuring sub-national life expectancy: a direct data approach or modelling?

Format: CPS Abstract

Author: Dmitri Jdanov

Life expectancy is the key indicator describing social and health aspects of human development. International comparative studies indicate that there is still no systematic evidence about reduction of longevity disparities between socioeconomic groups and areas and that they exist even in countries with strong social policies. The majority of the ongoing research focuses on mortality disparities by socio-economic status. However, the spatial dimensions of mortality changes are considered as equally important as they build the bridge between survival and socio-economic or cultural contexts. Yet measuring longevity disparities in small territorial units is a methodological challenge. Estimations of life tables for small areas might be based on direct data and standard demographic methods, statistical modelling, or combination of these two approaches. Prior studies show that very often conventional life table estimation methods may return implausible results. Therefore, the standard computation of life tables is usually complemented by some (often arbitrary) adjustments, smoothing or modelling parts of mortality curves. During the last decades the field of small area life table estimations has been increasingly relying on advanced statistical modelling approaches, including Bayes modelling. These approaches allow to obtain age-specific mortality estimates by using estimated parameters from a standard mortality schedule (e.g. national or higher rank regional unit) or by borrowing information for mortality estimation from either neighboring or similar areas or from areas with better quality data. As a result, one gets much more stable and statistically robust results, allowing to assess the magnitude and directions of changes of inequalities. The limitations of modelling include arbitrary choice of parameters, risk of overlooking specifics of mortality patterns in some areas. This study aims at testing three life table estimation strategies (direct data and standard estimation; TOPALS linear spline modelling; Bayesian modelling) using varying size municipality-level data for a small country, Lithuania. We found that application of the direct data approach is in certain situations impossible and/or requires substantial mechanical adjustments such as the aggregation over age groups and/or time intervals. In addition, some modelling is still required in order to get more plausible age-specific mortality patterns. Our results also suggest that even smoothing and modelling at old ages cannot solve all problems related to multiple zero cases and random fluctuations. In case of Lithuania, this approach also leads to implausible life expectancy values for smaller municipalities and wide confidence limits. Therefore, in many cases, the identification of direction of temporal changes in municipality-specific life expectancies is highly problematic. The modelling approaches such as TOPALS generally provide more stable and statistically robust estimates. For example, the modified TOPALS approach based on estimated parameters from standard (national) mortality schedule is efficient for producing more plausible age-specific mortality profiles for the smallest municipalities with multiple zero counts and random noise due to small numbers. Although the modelling approaches provide more stable, statistically robust, and realistic life expectancy estimates, they may lead to overlooking specifics of mortality in certain areas. More exhaustive analysis is needed to combine the advantages of both direct-data and modelling approaches.

Measuring sub-national life expectancy: direct data or modelling?

Format: CPS Abstract

Author: Domantas Jasilionis

Co-Authors:

  • Dmitri Jdanov
  • Laszlo Nemeth

Social/health/population statistics; regional/small area statistics

Measuring the NEET's "not in formal education" component through surveys spanning 2 school years.

Format: CPS Abstract

Author: Miss Salima MANSOURI

The aim of this paper is to discuss the following : - The NEET rate : the transition from a social exclusion indicator to an indicator that measures the current potential labor force. - National adaptation of international standards related to NEET rate: measurement of formal education in the case of surveys spanning 2 school years. - Recommendations to help countries align with the international standards.

Measuring the economic contribution of tourism industries: a satellite account approach

Format: CPS Abstract

Author: Yan Zheng

Co-Authors:

  • Yan Zheng
  • Yining Zhou

Tourism satellite accounts (TSAs) have been widely recognized as important tools to measure the economic contribution of tourism. However, due to lacks of detailed data for tourism,especially for tourism expenditures, a national TSA has not been developed in China. This study, therefore, introduces a full-scope tourism accounting framework with the systematic analysis of tourism from the perspective of supply, demand and product sides simultaneously.Then,a pilot TSA is established and discussed for a case study in Zhejiang Province, China. The TSA supply table for tourism products and reconstructed input-output table covering tourism and non-tourism sectors are presented here ,and tourism direct gross value added (TDGVA) and tourism indirect gross value added (TIGVA) are defined to measure the whole impact triggered by tourism in Zhejiang Province, China. The empirical results show crucial it is for tourism development to enlarge inputs for accommodation, catering and road transport. Overall, the methodology and results reported here provides the new perspectives on evaluating tourism industries economic importance,and promotes further TSA development and applications for both academics and practitioners.

Mendelian randomization methods and application to the recurrent mild malaria on dyslipidemia in African Ancestry Individuals

Format: CPS Abstract

Author: Dr Harouna Sangaré;

Co-Authors:

  • Mariam Traoré
  • Segun Fatumo

Dyslipidemia is becoming prevalent in Africa, where malaria is endemic. Observational studies have documented the long-term protective effect of malaria on dyslipidemia; however, these study designs are prone to confounding. Therefore, we used Mendelian randomization (MR- a method robust to confounders and reverse causation) to determine the causal effect of recurrent mild malaria (RMM) on lipid traits. in this communication, we performed two-sample Mendelian randomization Genome Wide association study (GWAS) summary statistics for RMM conducted in Benin, (N=775) and lipid traits from African ancestry individual in Million Veteran Program (N= 57,332). We found an association between RMM and levels of low-density lipoprotein cholesterol (LDL-C) (Beta = -0.025, 95% CI, -0.042 to -0.007 p-value=0.005) and total cholesterol (Beta = -0.019, 95% CI, -0.035 to -0.002, p-value= 0.028). No significant association was obtained with High-density lipoprotein cholesterol (HDL-C) and levels of triglycerides. The finding of this study supports a causal relationship between RMM and levels of LDL-C and total cholesterol. We believe that larger studies on the link between malaria and dyslipidemia in Africa will help to manage the burden of both diseases better.

Methods for estimating industrial water use in Canada

Format: CPS Abstract

Author: Dr Rezvan Taki

Co-Authors:

  • Beni Ngabo Nsengiyaremye
  • Ibrahim Ousmane Ida
  • Michael Schimpf
  • Martin Hamel

Different techniques for modeling industrial water use have been investigated.

Methods to promote CAWI collection in household surveys

Format: CPS Abstract

Author: Mr Antonio J. Rueda

The non-response rate in household surveys is significantly increasing. Using mixed modes for data collection is essential in an attempt to reduce this rate and to obtain effective samples that allow quality estimates to be provided. One of the most effective methods for dealing with this problem is the self-completion of the questionnaire via the Internet (CAWI), and it is therefore very interesting to promote this collection method and to use other methods only when the household is not responding the CAWI questionnaire. One or more procedures can be used to encourage informants to complete the CAWI questionnaire: promoting the use of this collection method using specific letters by mail, sending SMS to informants' cell phones, using e-mails to send the link for initiating the completion of the questionnaire, phone calls encouraging them to answer via the Internet, etc. This article studies some of these procedures, analyzing the pros and cons of their use, as well as their effectiveness and overall effect on response rates. For this purpose, we will use results from the Spanish Survey on Income and Living Conditions (SILC) conducted in 2022, showing how the use of several of these procedures together achieves very satisfaying results.

Micro Estimates of Nigerian Household Wealth: Evidence from Household Finance and Consumption Survey 2022Q2

Format: CPS Abstract

Author: Mr ABDULHAMID AUYO MUSA

Even though, Nigeria is rated Africa’s largest economy with a population of over 200 million people, the country is in recent times grappling with deteriorating standard of living as evidenced from some economic indicators. These indicators such as, rising inflation , high level of poverty , etc. have in many ways affected the household income, wealth, finance, and consumption patterns as well as their quality of life (a declining per capita GDP, 5.95% from 2019 and 0.58% from 2020 ). Thus, obtaining information on household wealth is among the key indicators to understanding the economic conditions of the country’s population as well as the distribution of wealth among the cohort of the population, be it gender, age group, or region etc. Therefore, the study of household wealth in Nigeria becomes imperative because of these ensuing economic conditions bedeviling the economy. This study attempts to examine at micro level, the estimates of household wealth in Nigeria using descriptive statistics from the household survey of 2022Q2. The Central Bank of Nigeria started conducting household survey from the fourth quarter of 2017.

Minimum information copula under fixed Kendall’s rank correlation

Format: CPS Abstract

Author: Mr Issey Sukeda

Co-Authors:

  • Issey Sukeda
  • Tomonari Sei

The minimum information copula (or the maximum entropy copula) is the most independent copula satisfying the given constraints. For these constraints, first-order expectation constraints on moments, such as Spearman's rank correlation, are mostly considered. On the other hand, such copulas under second-order constraints have not been studied well. We present a variant of minimum information copula that has a constraint on a popular second-order constraint known as Kendall's rank correlation, instead of first-order constraints. Due to this modification, the convexity of the problem becomes non-trivial and the form of density function of this variant is unknown. We analyze its property via one of the widely known discrete approximation of copulas, called checkerboard copulas. Checkerboard copulas can be considered identical to contingency tables. First, we introduce a transfer operation of probability mass on checkerboard copulas, which is technically equivalent to considering non-orthogonal basis of the total space of checkerboard copulas. Using this approach, we show several mathematical properties of the minimum information checkerboard copula under fixed Kendall's rank correlation. Firstly, this copula is characterized by a certain amount, which we name as "extended log odds ratio". It is also guaranteed that the density of this copula belongs to a function class known as "total positivity of order two (TP2)", one of the positive dependence properties that has been extensively studied for copulas. Furthermore, geometric interpretations of this problem setting will be investigated.

Mitigating Bias of Crowdsourced Data of the Impact of the Covid-19 Pandemic on Enterprises

Format: CPS Abstract

Author: Ms Yuniarti Yuniarti

This study will discuss techniques to mitigate bias in crowdsourced data on the impact of the Covid-19 pandemic on enterprises based on the experiences of BPS-Statistics Indonesia. The final goal is to produce official statistics. This study is beneficial to highlight that crowdsourced data is as powerful as traditional survey data for producing official statistics.

Mobile phone position data and official statistics in Sweden: results of a government assignment

Format: CPS Abstract

Author: Dr PIETER VLAG

People generate billions of datapoints about the geographical positions of their mobile phones every day. Mobile network operators (MNOs) capture, process and store these data for business purposes such as increasing the network performance. Anonymized and aggregated data derived from these phone positions (MNO-data) can be used for commercial exploitation of mobility patterns and for producing official statistics about population mobility and travelling. Statistics Sweden recieved a government assignment end 2021, which aimed  to 1) to describe conditions for use of MNO-data for official statistics, 2) to develop new 'smart' statistics, and 3) to replace (parts of) current surveys with these data. This government assignment was carried out with two MNOs and other governmental agencies. It elaborated further on previous work with one operator in 2020-2021, which is well-known in Sweden for its commercial exploitation of MNO-data by following the strictest data-privacy rules. One important result of the government assignment is that interest exists in a statistics about dynamic populations derived from MNO-data (monthly variation in the actual number of Swedish residents in a municipality), adjacent to the current register-based static population statistics. Another important conclusion is that on one hand a legal framework is needed to get access to the data but on the other hand partnerships are needed as data-processing needs to be carried out at MNO-premises due to privacy rules. Another finding which will be presented during the presentation is the need of standards for processing MNO-data.

Model Assisted Approach to Estimate Production and Sales of the Manufacturing Sector in the Philippines

Format: CPS Abstract

Author: Dr Divina Gracia Del Prado

Co-Authors:

  • Melanie C. Estrada
  • Joyce B. Egsan

The Monthly Integrated Survey of Selected Industries (MISSI) is a national survey with 920 sample establishments conducted by the Philippine Statistics Authority. The survey generates indices on value and volume of production and sales of the manufacturing sector to measure changes over time. With manufacturing sector as a major driver of economic growth of the Philippines, generation of high-frequency statistics in terms of levels is beneficial and contributes to a more comprehensive analysis of performance. However, MISSI is not designed to generate reliable levels of production and sales as it utilizes a cut-off sampling design using value of production as the cut-off variable. Hence, levels of production and sales from MISSI survey-based results will be biased upward as the sample establishments of the survey are the big players of the manufacturing sector. This paper presents a model-assisted approach in the estimation of production and sales using the results of MISSI and auxiliary information from the population of establishments, instead of conducting a separate survey which requires higher sample size and higher budget. Results indicate that the model-assisted formula generates reliable estimate of level of production and reduces the upward bias from the MISSI survey-based results.

Model-Based Stratification of Payment Populations in Medicare Integrity Investigations

Format: CPS Abstract

Author: Dr Piaomu Liu

Co-Authors:

  • Don Edwards

One hundred sixty-six samples from Medicare integrity investigations are displayed and described along with 156 associated payment populations. The samples support the All-or-Nothing mixture model for the data previously described in the literature. This model motivates certain Monte Carlo testing procedures for sampling plans as well as stratification methods based on anticipated model moments. We proposed a new stratification method that was applied to the real data samples from Medicare integrity investigations and tested it using the aforementioned Monte Carlo testing procedure.

Modeling and Analysis of Multi-Level Recall-Based Competing Risks Data

Format: CPS Abstract

Author: Dr Chandra Prakash Yadav

The current study is an attempt to deal with recall-based data that arises in survey and cross-sectional studies. Since, memory fades as the difference between interview time and time to event increases. This information is utilized in the model by choosing suitable functional form of the non-recall probability.

Modeling count data, a generalized additive models for location, scale, and shape (GAMLSS).

Format: CPS Abstract

Author: Mr Kajingulu Malandala

Count data have received increasing attention in many fields including economics, finance and epidemiology . The classical Generalized linear models is flexible and widely used in the analysis of count data. However, there are cases where required model assumptions are not met. To overcome these limitations, this study proposes a convenient model for count data when the response variables does not follow an exponential family distribution. The proposed approach does not only considers the mean but it also includes other moments of the dependent variables distribution. The study applied GAMLSS models on the COVID-19 data for South Africa to develop a model that captures the patterns of cumulative cases and fatalities.

Modeling longitudinal skewed functional data

Format: CPS Abstract

Author: Mohammad Samsul Alam

Co-Authors:

  • Ana-Maria Staicu

Functional data arise nowadays very commonly due to technological advancements in different sectors. Our focus is on functional data that were recorded longitudinally over time and has asymmetric point wise variation. The motivation comes from diffusion tensor imaging study where water diffusivity along corpus callosum of brain was recorded repeated over time from different subjects. We call the proposed approach skewed functional data analysis.

Modelling Clustered and Hierarchical Count Data: Poisson-Gamma Regression

Format: CPS Abstract

Author: Dr Zakir Hossain

Co-Authors:

  • Shrabanti Debnath

Count data and mixed models.

Modelling consumer preferences in Multilateral Method for CPIs

Format: CPS Abstract

Author: Prof. Tiziana Laureti

Co-Authors:

  • Dr. Federico Crescenzi
  • Dr. Jan de Haan

The ideal session for this paper deals with one or more of the following topics: consumer preferences, multilateral price indexes, cost-of living indexes, inflation measurement

Modelling the Survival Status of Breast Cancer: A Machine Learning Approach.

Format: CPS Abstract

Author: Dr Serifat Adedamola Folorunso

Co-Authors:

  • Richard Kehinde

The application of machine learning in clinical trials and cohort studies cannot be underestimated because most data generated are featured with high-dimensional, censored, heterogeneous, and frequently missing information, posing difficulties for conventional statistical analysis. It is important to provide alternative techniques to model this complex data to circumvent these constraints.

Modelling the impact of COVID-19 pandemic on some Nigerian sectorial Stocks: Evidence from GARCH models with structural breaks

Format: CPS Abstract

Author: Dr Monday Osagie Adenomon

Co-Authors:

  • Idowu, R. A

In the past the world stock markets have suffered from Global financial crisis and now the market is been plague by the COVID-19 pandemic of which developed, developing and underdeveloped economies and markets are not left out. Therefore, this study investigates the impact of COVID-19 on five (5) Nigerian Stock Exchange (NSE) sectorial stocks namely: NSE Insurance, NSE Banking, NSE Oil and Gas, NSE Food and Beverages, NSE Consumer goods. To achieve the goal of this paper, daily stock prices were obtained from a secondary source ranging from 2nd January 2020 to 25th March 2021. Because of the importance of incorporating structural breaks in modelling stock returns, the Zivot-Andrews Unit Root Test was applied to the stock returns and the results of the test revealed 20th January 2021, 26th March 2020, 27th July 2020, 23rd March 2020 and 23rd March 2020 as potential break point for NSE Insurance, NSE Food, Beverages and Tobacco, NSE Oil and Gas, NSE Banking and NSE Consumer goods respectively. This study investigates the volatility in daily stock returns for the five (5) Nigerian Stock Exchange (NSE) sectorial stocks using nine variants of GARCH models: sGARCH, girGARCH, eGARCH, iGARCH, aPARCH, TGARCH, NGARCH, NAGARCH, and AVGARCH along with the half-life and persistence values were obtained. The study used the Student t and Skewed Student t distributions. The results from the GARCH models revealed negative impact of COVID-19 on NSE Insurance, NSE Food, Beverages and Tobacco, NSE Banking and NSE Consumer goods stock returns except NSE Oil and Gas returns which showed positive correlation with the COVID-19 Pandemic. This study recommends shareholders; investors and policy players in the Nigerian Stock exchange markets should be adequately prepared in form of diversification of investment in stocks that can withstand future possible crisis in the market.

Modelling the resolution time of defaulted loans using cure models with a frailty component

Format: CPS Abstract

Author: Dr Marius Smuts

Co-Authors:

  • James Allison
  • Janette Larney
  • Gerrit Grobler
  • Gary Sharp

In recent years, statistical models within the credit risk environment have become increasingly popular. Initially, the focus was on the assessment of a counterparty's creditworthiness (i.e. default risk), however the modelling of the recovery process and outcome after default is receiving more and more attention. Two main outcomes, or resolutions, after default are possible: the outstanding amount is fully recovered, or the loan is written off. There exist several obligor and loan specific factors that may influence a loan's probability of being written off. Some of these factors are observable and can be included in a model as covariates, but some factors, for example an individual's level of discretionary expenditure and undisclosed debt, are latent. There also exists a proportion of defaulted loans for which the outstanding amounts will be fully recovered and is therefore not exposed to write-off. In our context these loans can be considered "cured". The time to write-off or full recovery may, in addition to latent competing risks, also be influenced by common, unobservable drivers, such as the state of the economy; we therefore propose to use a promotion time cure model and include a frailty parameter to control for heterogeneity. We evaluate the performance of the model via a small Monte Carlo study and also apply it to a real world banking data set, where it is found that the new model outperforms the more traditional models.

Modernizing Access to Statistics Canada’s Microdata Files

Format: CPS Abstract

Author: Dr Sara Tumpane

Over the last several years, Statistics Canada (StatCan) has been forming and implementing its modernization strategy, with a focus on user-centric delivery, sharing and collaboration, and enhanced tools and platforms. In line with this corporate initiative, StatCan’s Data Access Division (DAD) has been leading efforts to update the platforms and governance that facilitate access to microdata for external users.

Modernizing the Quarterly Economic Data Collection in Abu Dhabi

Format: CPS Abstract

Author: SAMI KHASAWNEH

Co-Authors:

  • Sami Nizar Khasawneh

Abu Dhabi emirate is the federal capital of the United Arab Emirates (UAE) and the largest of the seven emirates. Abu Dhabi emirate consists of three main regions, Abu Dhabi Region, Al Ain region and Al Gharbia region. Throughout the last few decades. Abu Dhabi Government devoted outstanding efforts, to enhance the emirate’s profile and reputation gained by the most up-to-date systems, on all levels (economic, social, environment and excellence in government performance,) and laid down a number of plans and initiatives, ensuring an appropriate and effective alignment of development gains/ benefits, and build a sustainable economy serving all the Emirate’s regions. (The Government of Abu Dhabi, 2007). Statistics Centre – Abu Dhabi (SCAD) established in accordance with Law No. (7) of 2008 as the main authorized body concerned with official statistics in the Emirate of Abu Dhabi. SCAD is responsible for the collection, classification, storage, analysis and dissemination of official statistics covering social, demographic, economic, environmental and cultural indicators. Accordance with Law No. (5) of 2021 Concerning the Reorganization of Statistics Centre - Abu Dhabi; The Centre may use all data sources to prepare official statistics, and all private institutions and companies shall provide the Centre, upon its request, without any consideration, with the detailed data it requests within the limits of its competence and in a form that can be processed electronically. Quarterly National Account Indicators (QNAI), They are quarterly estimates of the Gross Domestic Product (GDP) at current prices, and they measure the contribution of the organizational sectors and various economic activities, in addition know the aspects of spending on the GDP in Abu Dhabi Emirate. The outcome of this project provides key economic indicators, particularly for the quarterly GDP of Abu Dhabi Emirate. Such information is of crucial significance to the Emirate, especially in view of the unprecedented development and growth the Emirate's Economy is currently experiencing. In addition, the data will provide businesses, investors and general stakeholders with baseline business cycle information for investment decisions. Previous Method: The Quarterly Economic Survey (QES) collected data from 124 establishments. It included two questions about the revenue and employment numbers for each quarter. The responses were combined with administrative data to calculate indicators about the changes in the economy. These indicators were used to extrapolate the benchmarked quarterly Value Added for each economic activity. Modern Method: The data collection process is carried out quarterly during a period of time and with a sample size of around 2,300 establishments by specialized questionnaire includes many questions about employees, revenues, intermediate consumption, etc. Specified establishments are visited, according to economic activity. The target population in the QES includes all establishments operating in Abu Dhabi Emirate and registered under the updated economic establishments framework. Justification: Provide necessary quarterly data to prepare national accounts aggregates such as GDP. Provide short-term data to calculate the contribution of economic activities and measure the extent of economic diversification and development in the non-oil sectors. Provide short-term data on Small and Medium Establishments (SMEs) to evaluate their investment and take appropriate decisions.

Monitoring and Measuring MSMEs Activity From Social Media By Using BERT-based Event Extractor

Format: CPS Abstract

Author: Ms Erika Siregar

Co-Authors:

  • Erika Siregar

One characteristic of Indonesia's economy is the significant contribution of Micro, Small, and Medium Enterprises toward the national Gross Domestic Product (GDP) and domestic economic activities, contributing to 60% of the national GDP and 96.9% of employment. To monitor MSMEs' activities and ensure their health condition and continuous growth, we are going to use social media. Since extracting events from social media text is challenging, we are going to use BERT-based event extraction.

Monitoring the implementation of the GSBPM using balanced scorecard: The case of High Commission For Planning (NSO) - Morocco

Format: CPS Abstract

Author: Mr Mohamed SALIMI

The High Commission for Planning (HCP) has implemented the Generic Statistical Business Process Model (GSBPM) as a quality and metadata management tool.  Although, the model was adopted by different business units to varying degrees, it is crucial to measure at what extend the model was correctly adopted, how many collaborators are involved in the process? how the description of processes and sub-processes was performed, did it allow a good assessment of the process’s quality the documentation? Did, the implementation of the GSBPM allowed the enhancement the standardization of the processes? Are the processes implemented smoothly and efficiently to save time previously spent on similar tasks? And how the quality of products and services has evolved accordingly?

Moroccan household satellite account: methodology and results

Format: CPS Abstract

Author: Mrs YATTOU AIT KHELLOU

Co-Authors:

  • BAHIJA NALI
  • zohra bouhaidoura

The establishment of a household satellite account, focusing on the non-SNA household production, is an opportunity to make women's domestic work visible. Based on internationally recognized methods, the production of Moroccan household satellite account required the use of several sources of data, namely the national accounts of the reference year, data from the time use survey, household consumption surveys, the survey on informal sector and the national labour force survey. This account aims to estimate the production and value added of unpaid domestic work, broken down by activity. This production being non-market, its estimate is made considering the cost of the various necessary inputs: the production of households outside the SNA results from the combination of unpaid work, goods, services and capital.

Moving Students: How Transfer Students Affect Power and Type I Error in Stepped Wedge Designs

Format: CPS Abstract

Author: Meredith McCormack-Mager

Co-Authors:

  • Abigail Shoben

Student transfer between schools is common in the United States, with one-third of fourth graders having changed schools in the past two years. How this mobility affects the viability of research in schools is a concern for researchers who use the stepped wedge cluster randomized trial (SW-CRT) design. We examined how misspecification and contamination caused by student transfer can affect power and Type I error in SW-CRTs in a variety of settings and will discuss strategies to limit power loss.

Multi-dimensional reduction techniques applied to measuring global interlinkages between SDGs

Format: CPS Abstract

Author: Mr Jean-Pierre Cling

Co-Authors:

  • Clément Delecourt

We measure interlinkages between SDGs, applying linear dimensionality reduction techniques on a dataset derived from the UN Global SDG Database, which is the official source for all SDGs indicators. This is the first study of this kind on this topic at the world level. The Multiple Factor Analysis used to synthesize the correlations between indicators shows that SDGs related to human development alone contribute to 30 % of the observed variance of all the indicators at the world level, and that country performances in this field are strongly correlated to their income level. The Hierarchical Cluster Analysis distinguishes three country clusters according to their performance in terms of SDG indicators.

Multi-state models: an appraisal with an application to real data

Format: CPS Abstract

Author: Prof. Claudia Adriana Castro Kuriss

Co-Authors:

  • Victor Leiva

Survival and reliability analysis are employed to handle censored data from different fields of science. There are common problems like trying to model these data with survival curves (using the very employed Kaplan-Meir estimator or the well-known Cox model). Nevertheless, these are no longer useful when other challenges arise, like for example, competing risks and multi-sate models. The problem appears now very frequently when a subject suffers multiple events, like several infections, or relapses of the same illness, or multiple failures of the same or different kind or the failure is due to other reason than that expected. In these cases, there are different estimators for the survival curves and extensions of the Cox proportional hazard model has been proposed. There are also others approaches and interpretations of the challenges as can be found in this session.

Multidimensional Poverty and Economic Growth in Indonesia

Format: CPS Abstract

Author: Mr Hilman Hanivan

Co-Authors:

  • Hilman Hanivan

Poverty alleviation has been every nation's development priority since we can remember. Various efforts have been made to shrink the number of people living below the poverty line, one of which is by boosting national economic growth in the hope that the growth will be inclusive of raising the living standard of those at the bottom of the economy. In the same spirit of eradicating poverty, researchers have also developed various methods to provide an accurate portrait of poverty. The most popular one is measured using the monetary approach. While this measurement gained popularity due to its simplicity, studies that thrive on extending the notion of describing the complexity of poverty using multidimensional approaches are also growing. The perception of poverty as a multi-faceted phenomenon has also become a worldwide concern, it is reflected through the 'ending poverty in all its form' jargon which has been listed among the other goals in SDGs. Many studies have tried to unravel the relationship between poverty reduction and economic growth. This study extends the literature on that relationship by applying the poverty indicator that was measured through the multidimensional framework. Using the data across provinces in Indonesia in 2012 and 2017, this study finds that provinces in the eastern part of the country have the smallest annualized relative change in incidence indicator, yet they have the biggest annualized relative change in intensity indicator. Next, this study also finds that economic growth might not be associated with changes in the proportion of multidimensional poor individuals in the population, but it is associated with the declining average deprivation rate among the poor. Finally, this study also has several policy implications.

Multiple Imputation for Aggregate Data in an Individual Patient Data Meta-Analysis

Format: CPS Abstract

Author: Dr Michael Larsen

Meta-analysis is used to combine results from studies to produce a more definitive answer. Individual patient data meta-analysis combines the subject-level data from multiple studies. When a study cannot or will not provide individual data, it is proposed that one multiply impute data based on aggregate statistics via hot deck imputation and modeling. Methods are applied in a study of antibiotic treatment of bacterial vaginosis to prevent preterm delivery and studied through simulation.

Multiple Imputation for Directional Data.

Format: CPS Abstract

Author: Mrs Sneha Babel

Co-Authors:

  • Dr Akanksha S Kashikar
  • Dr Manik Awale

Observations consisting of directions or angles are found across many areas of science, including ecology, earth sciences, environmental science, and medicine. Examples of such data are the angular movements of an animal relative to a food source or other attractor, wind directions, diurnal measurements of admission times to an intensive care unit, and departure directions of birds after release. Circular data arise whenever directions are measured, and are usually expressed as angles relative to some fixed reference point, such as Due North. Time data measured on a 24 h clock may also be converted to angular measurements, with 0:00 corresponding to 0◦ and 24:00 to 360◦. Like other datasets, missing values is a major problem in the case of circular data as well. In this paper, we discuss some imputation methods for missing values in circular data by using the technique of multiple imputation via chained equations. This ensures that, the relationships among these variables can be used for better imputation of missing values. We restrict our attention to bivariate datasets consisting of at least one circular variables. The performance of the method is assessed via comparison of the distribution of the original and imputed datasets through an extensive simulation study. Keywords: Directional data, Missing data, Multiple imputation via chained equations, Simulations

Multivariate Analysis following Multiple Imputation of HIV Risk Behaviours among Youth in the Kingdom of eSwatini (formerly Swaziland)

Format: CPS Abstract

Author: Ms Maphoka Qhobela

Data sets with missing values are standard in practice, and data imputation is a way of preparing data for analysis. Multiple imputations, in which missing values are replaced with multiple plausible values, is the preferred approach for working with missing values in survey data. The advantage of multiple imputation over other imputation methods is that it accounts for the uncertainty due to missing values. Cluster analysis is an approach for discovering groupings and patterns in a data set. Standard cluster analysis approaches require complete data; hence, data imputation before cluster analysis is important. Similarly, other exploratory techniques also require complete data sets. Risk behaviours are those behaviours that are said to elevate the risk of HIV infection. Early sex debut, multiple sexual partners, transactional sex, low condom use, and low male circumcision are identified risk behaviours. In Eswatini, for those who reported to have had early sex debut (before the age of 15), 23.7% were HIV positive. Of those with more than one sexual partner, 28.7% were HIV positive, while 20.9% of adults who did not use condoms at last sexual intercourse in the prior 12 months were HIV positive. Identifying the pattern of these risk behaviours will assist eSwatini in the fight against HIV, which will translate into achieving the 2030 United Nations Agenda on zero infections, Three stages are proposed in multiply imputed data: imputation, analysis, and combining results. In the combining stage, Rubin proposed the methodology for calculating mean and variances, but cluster analysis is concerned with hidden patterns in the data. No set of parameters defines cluster analysis. This study proposed a technique to enable the researcher to decide on the number of clusters, interpret clusters, and make overall observations about data patterns in that data set, following multiple imputation. The study also discussed how specific clusters could be identified as outliers and optionally excluded in the subsequent analysis stage.

Multivariate Time Series Analysis: Linear Transformation of Variables Involved.

Format: CPS Abstract

Author: Dr Iyabode Oyenuga

Co-Authors:

  • Frank Coolen
  • Tahani Coolen-Maturi

Vector autoregressive moving average models (VARMA) and the Vector autoregressive (VAR) ones are used in econometrics, particularly in time series analysis to reveal the cross-correlations between series, exceeding the isolated analysis of the data series. VARMA models are the multivariate generalization of univariate autoregressive-moving average (ARMA) models. The apparent lack of interest in multivariate models with moving average errors is that they are too difficult to implement but still VARMA models forecast macroeconomic variables more accurately than VARs.

Multivariate distorted distributions

Format: CPS Abstract

Author: PROF. DR. Jorge Navarro

The univariate distorted distributions are a good tool to model risks, order statistics, coherent system lifetimes, etc. This concept is extended to model joint distributions of random vectors. The representations are similar to copula representations by changing the marginal distribution functions with arbitrary univariate distribution functions. The main advantage is that this representation has similar properties to that of copulas, it is more flexible and that, in some models, it allows to get simple representations. These facts are illustrated with several examples which include paired data, multivariate residual lifetimes, record values, order statistics and coherent systems. In particular, the representation can be used to predict these values by using quantile regression techniques. This work is partially supported by Ministerio de Ciencia e Innovación of Spain under grant PID2019-103971GB-I00/AEI/10.13039/501100011033.

Multivariate skew-t regression with censored or missing responses

Format: CPS Abstract

Author: Mr Christian Eduardo Galarza Morales

Co-Authors:

  • Katherine A. L. Valeriano
  • Larissa A. Matos
  • Victor H. Lachos

Skew-t regression models have been widely used to model and analyze asymmetric heavy-tailed data. Moreover, observations in this kind of data can be missing or subject to some upper and/or lower detection limits. We propose a novel robust regression model for multiple censored or missing data based on the multivariate skew-t distribution for such data structures. This approach allows us to model data with great flexibility, simultaneously accommodating heavy tails and skewness. Results obtained from the analysis of both simulated and real datasets demonstrate the effectiveness of the proposed method.

National and local population projections with Bayesian hierarchical models

Format: CPS Abstract

Author: Violeta Calian

In this paper we describe an approach to population projections based on Bayesian hierarchical models and implemented in open source R code. Time and age (auto-) correlations are incorporated via Gaussian process priors and/or flexible non-linear smooth terms while spatial and social-demographic characteristics are easily included.The method efficiently solves small area/population issues and allows us to incorporate qualitative and quantitative prior information and even expert assumptions.

Near-real time monitoring of urban green spaces through remote sensing data

Format: CPS Abstract

Author: Marian Necula

Co-Authors:

  • Bogdan Oancea
  • Tudorel Andrei

Urban green spaces are of increasingly significance for city planning, mitigation of climate change and pollution effects. International and national organizations are stressing the importance of preserving and expanding urban green spaces in order to ensure a city sustainability. The UN includes urban green spaces in the Sustainable Development Goals 2030 Agenda. Official statistics is trusted with providing relevant, timely and cost efficient statistics. The advent of freely available remote sensing data sources provides strong opportunities to implement new meaningful statistical products, which satisfy the aforementioned quality criteria. Land use and land cover statistics based on remote sensing data are considered to be a low hanging fruit, while several projects carried out by national statistical offices and other agencies are underway to develop new statistics based on remote sensing data. The paper presents the implementation and results from several methods, pixel-based and object-based image analysis, used to estimate the absolute value in squared kilometers and percentage wise of green area surfaces at a nation wide scale, for 40+ Romanian cities between 2016-2022. Through a combination of several remote sensing data sources(from optical multispectral and synthetic aperture radar sensors), we provide evidence towards the feasibility of using remote sensing data in producing relevant statistics. By combining the two types of sensors data we overcome the disadvantages and inherent risks associated with a single data source, and are able to disseminate the new statistics nearreal time within an interval of 5 days.

New insights with "old" register data: from cross-sectional to longitudinal migration statistics

Format: CPS Abstract

Author: Dr Johanna Probst

Swiss official statistics traditionally look at immigration and emigration from a cross-sectional perspective, focusing on the persons staying for more than one year in Switzerland. Doing so, some aspects of international migration – or rather “mobility” – up to now fell out of focus. Since 2022, the newly developed longitudinal demographic statistics (DVS) allows for original insights into the migration landscape in Switzerland by cohort analysis. The new statistics link data from various register sources and construct time-harmonized biographies using standardized and transparent production rules. The cumulative longitudinal database starts in 2010, year of the systematic introduction of a personal identification number enhancing the possibilities of data linkages. The dataset today adds up to 11 million records, telling stories about migration trajectories. For example, 75% of all observed persons did not engage in any international migration, the remaining quarter showing at least one migration movement. Among all persons that immigrated to Switzerland in 2011, 53% had left the country again until 2021. On the other hand, only 23% of the emigrants from 2011 reversed their migration movement in the following decade by returning to Switzerland. Among foreigners who immigrated in 2011 and stayed continuously until 2021, a majority managed to obtain a more long-term residence permit; 3% even got Swiss citizenship. This talk will present the background, methods and results of the longitudinal demographic statistics, also addressing questions on appropriate dissemination approaches. It will thus exemplify how existing register data can be used and reused for statistical purposes and how data linkage enhances the analytic possibilities in official statistics.

New opportunities for African industry in the 21st century: Moroccan potentials

Format: CPS Abstract

Author: Madam Fatima TOUZI

This work describes how Africa can adapt to the new industries of the 21st century. Through a win-win strategy that support both manufacturing as well as 'industries without smokestacks', by means of four drivers of industrialization

Non-parametric test for Markov Regime Switching Model with Intervention Parameter for Measuring Epileptic Seizure based on EEG

Format: CPS Abstract

Author: Ms Mara Sherlin Talento

Co-Authors:

  • Erniel Barrios
  • Hernando Ombao

In modeling electroencephalograms (EEG) of patients with epilepsy, one common interest is studying the magnitude of seizure on the brain signal. The volatility of these brain signals together with the presence of seizure makes the analysis more complicated. This paper proposed a method of modeling seizures in EEG by having a Markov hidden state process. We present a method of measuring magnitude of seizure that allows the regimes to be a function intervention parameters (step and pulse) at different time points. A simulation study was done to see the effect of intervention types, magnitude of intervention, and location of intervention on the power and size of the test. The study used Markov Chain Monte Carlo with backfitting algorithm to estimate the parameters and bootstrap approach for hypothesis testing. The simulation study indicated that the test is correctly sized and powerful when the intervention is of a pulse-type. When applied to one region of EEG, the model provided MAPE of 5.25% pre-seizure and 7.08% post-seizure. It also have shown that the distribution of state is more concentrated to high state when time of seizure is approaching.

Nowcasting Official Poverty Statistics in the Philippines

Format: CPS Abstract

Author: Ms Sabrina Romasoc

Co-Authors:

  • Manuel Leonard Albis
  • Josefina Almeda
  • Dannela Jann Galias
  • Roxanne Elumbre
  • Ann Umadhay

In the Philippines, official poverty statistics are generated through the Family Income and Expenditure Survey (FIES) conducted by PSA. The FIES is a national survey of a representative sample that provides information on family income, sources of income, family expenditure, and other relevant household characteristics. However, the survey is only administered every three years, which causes a big interval between each release of official poverty statistics. To address the need for more timely poverty statistics, the Philippine Statistical Research and Training Institute (PSRTI) conducted a study in 2021 titled “Nowcasting Official Poverty Statistics in the Philippines,” which sought to explore an alternative methodology that could accurately predict poverty levels in the country through the use of the Dynamic Factor Model (DFM). This paper shows that this alternative methodology can be operationalized and may aid in the generation of more timely poverty statistics for the Philippines.

ON THE DEVELOPMENT OF CALIBRATION ESTIMATOR IN THE PRESENCE OF MEASUREMENT ERROR AND NON-RESPONSE UNDER STRATIFIED SAMPLING

Format: CPS Abstract

Author: Dr Olaniyi Mathew Olayiwola

Co-Authors:

  • Ishaq O. O
  • Apantaku F. S
  • Onifade O. C
  • Olayiwola O. M

This research extended the theory of calibration estimation and provides a novel developed calibration approach alternative to existing calibration estimators for estimating population mean of the study variable using auxiliary variable in stratified sampling.

Occupation Choice Matters to the Economic Return of the School Year in Nepal. Evidence from Household Cross-Sectional Data.

Format: CPS Abstract

Author: Kapil Dev Joshi

The different economic return pushes a question in the scholars' minds that the economic return depends on the school year or the occupation choices in Nepal. In this regard, this study finds the positive impact of school year on economic return as a wage. School year is endogenous; that might estimate biased results. The instrumental variable (IV) technique is applied to estimate unbiased and consistent results. Using instrumental variables on the household level microdata of Nepal, one additional year of schooling increases the economic return in terms of wage by 4.3 percent. In addition, the estimates show education return for females is higher than for males. The choice of occupation matters the economic return of the school year significantly. Workers related to technical and associate professionals work have the highest return to education, about 14.1 percent among nine occupations in Nepal. Furthermore, a few scholars found the economic return of the school year about 6 percent by using OLS. This study also supports previous studies and confirmed the economic return of the school year has a significant positive impact using different IV techniques of estimation.

Official Statistics Quality Auditor; new role in National Statistical System

Format: CPS Abstract

Author: Mr Saeed Fayyaz

Co-Authors:

  • Arash Fazeli

this paper investigate quality matters in official statistics and new role for statistical systems. So, official statistics, quality insurance framework and third parties can be session that fit to this paper.

On Familywise Error Rate Cutoffs under Pairwise Exchangeability

Format: CPS Abstract

Author: Dr Thomas Fung

Co-Authors:

  • Thomas Fung
  • Eugene Seneta

In a pairwise exchangeable dependence setting for test statistics, the cutoffs of Sarkar et al (2016) may be viewed as a first iteration improvement of Holm (1979)’s classical cutoffs under a convexity condition on the copula. The cutoffs of Seneta and Chen (1997) which improve Holm’s in the present exchange- ability setting, are shown, after an analogous first iteration step, to lead to a refinement of Sarkar et al (2016). Further, we show that the convexity condi- tion can be circumvented in practice, computationally. Improvement by iteration limit of cutoffs is considered for both procedures. Comparisons between the effects of the several cutoff sets are made by way of plots of the familywise error rate against correlation ρ in the classic setting of the multivariate Normal; and the distributional setting of the multivariate Generalized Hyperbolic for the impor- tant Variance Gamma type subfamily, for which a convexity condition cannot be analytically verified.

On Inference Methods in Generalized Mean-Reverting Processes with Change-Points

Format: CPS Abstract

Author: Prof. Sévérien Nkurunziza

In this talk, we present some inference methods in generalized Ornstein-Uhlenbeck processes with multiple unknown change-points when the drift parameter is suspected to satisfy some restrictions. The originality of the established results consists in the fact that the number of change-points and the locations of the change-points are unknown. We generalize some recent results in five ways. First, our inference method incorporates the uncertain prior knowledge. Second, we derive the unrestricted estimator (UE) and the restricted estimator (RE) as well as their asymptotic properties. Third, we establish a test for testing the hypothesized restriction and we derive its asymptotic power. Fourth, we propose a class of shrinkage estimators (SEs) which includes as special cases the UE, RE, and classical SEs. Fifth, we study the relative risk dominance of the proposed estimators, and we establish that SEs dominate the UE and the RE performs very well when the restriction is nearly verified, but this performs poorly when the restriction is seriously violated. The additional novelty of the established methods consists in the fact that the dimensions of the proposed estimators are random. Because of that, the asymptotic power of the proposed test and the asymptotic risk analysis do not follow from classical results in statistical literature. To overcome this problem, we establish an asymptotic result which is useful in its own.

On Robustness of Statistical Inference based on the Logarithmic Super Divergence Family

Format: CPS Abstract

Author: Dr AVIJIT MAJI

Co-Authors:

  • Abhik Ghosh
  • Ayanendranath Basu

This paper discusses a new superfamily of divergences that is similar in spirit to the S-divergence family introduced by Ghosh et al. (2017). This new family serves as an umbrella that contains the logarithmic power divergence family (Renyi, 1961; Maji et al., 2017) and the logarithmic density power divergence family (Jones et al., 2001) as special cases. Various properties of this new family and the corresponding minimum distance procedures are discussed with particular emphasis on the robustness issue; these properties are demonstrated both theoretically as well as through simulation studies. In particular the method demonstrates the limitation of the first order in influence function in assessing the robustness of the corresponding minimum distance procedures. In this respect, for the first time, we examine the necessity and usefulness of the third order in influence functions for the divergence based test statistics.

On Some Robust Liu Estimators for the Linear Regression Model with Outliers: Theory, Simulation and Application

Format: CPS Abstract

Author: Dr Abdul Majid

Co-Authors:

  • Shakeel Ahmad
  • Muhammad Aslam

The Liu m-estimator (LME) is available in the literature to tackle the issue of multicollinearity and outlier simultaneously in linear regression. The estimation of Liu biasing parameter d is the core issue while using the LME. The present study is intended to propose some robust estimators of d. The performance of the proposed estimators is compared with the available methods through a comprehensive Monte Carlo simulation and a real data example.

On estimating the proportion of susceptibility with zero-inflated models

Format: CPS Abstract

Author: Prof. Wen-Han Hwang

Co-Authors:

  • Lu-Fang Chen
  • Jakub Stoklosa

The proportion of susceptibility is often the critical parameter of interest in zero-inflated data analysis. We investigate the effects of parameter estimation when heterogeneity is present in the event count intensity and the susceptibility probability. We show the susceptibility probability is underestimated if heterogeneity in the event count intensity is ignored. On the other hand, the behavior is different if heterogeneity in the susceptibility probability is ignored; notably, an estimate of the average susceptibility probability may be unbiased or over- or under-estimated depending on the relationship between count intensities and susceptibility probabilities.  A conditional likelihood approach is proposed to estimate the intensity parameters and the average susceptibility probability, provided that the count intensity component model is correctly specified.

On the Aspects of Instantaneous and Early Failure Data: A Modified Bivariate Weibull Distribution and Survival Function

Format: CPS Abstract

Author: SUMANGAL BHATTACHARYA

Co-Authors:

  • Ishapathik Das
  • Muralidharan Kunnummal

In reliability, the lifetime data is usually modeled using one or two parametric distributions, such as Weibull, gamma, log-normal, Pareto, etc., which are unimodal by nature. Sometimes, the data may contain many zeros or close to zero data points, defined as inliers (instantaneous or early failure observations) in the literature. The usual modeling approach using the uni-modal parametric distributions may not provide expected results for such data in the presence of inliers. Furthermore, correlated bivariate observations with inliers frequently occur in reliability; here, we propose a method of modeling bivariate lifetime data with instantaneous and early failure observations. We construct a new bivariate distribution function by combining bivariate uniform and Weibull distributions. The bivariate Weibull distribution is obtained using a 2-dimensional copula, assuming the marginal distributions as two parametric Weibull distributions. We derive some properties of that modified bivariate Weibull distribution, mainly the joint probability density function, the survival (reliability) function, and the hazard (failure rate) function. The model’s unknown parameters are estimated using the Maximum Likelihood Estimation (MLE) technique combined with a machine learning clustering algorithm. Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. The method is also applied to real data and compared with existing methods in the literature.

On the evidence-based findings of the Nutrition and Retrospective Mortality Survey conducted in August, 2021 in Maiduguri, Borno State, Nigeria.

Format: CPS Abstract

Author: Dr Anthony Ekpo

We present evidence-based findings of the Nutrition and Retrospective Mortality Survey, conducted in August, 2021 in Maiduguri, Borno State, Nigeria. The study was conducted during the insurgency period, when Boko Haram (A popular armed group) consistently attacked the Northern part of Nigeria especially the North-eastern States of Borno, Adamawa and Yobe (the BAY States). The survey’s overall objective was to determine the magnitude and severity of malnutrition and retrospective mortality rates among the under-five children and by extension the entire population in Bolori-II, Nigeria with the hope for a more oriented and well-designed intervention which would also contribute in up-scaling the Nutrition program in Bolori II and Borno State in general. The survey was also able give an idea on the nutritional status of children in the area prior to the COVID-19 pandemic, when apparently there was suspension of activities for greater parts of 2020. Overall, the outcome of the survey was able to give policy direction that ultimately boosted the nutritional and health status of the vulnerable children and others living in the community. Community members from 40 randomly selected clusters in Bolori II were assessed to determine the prevalence rates of acute malnutrition among children 6 to 59 months of age using WHZ, WFA, HFA, MUAC and bilateral oedema approach

Optimal Design of Accelerated Degradation Tests

Format: CPS Abstract

Author: Dr Ming-Yung Lee

Co-Authors:

  • 00 00

This paper proposes a two-stage optimization method to theoretically obtain the statistically and economically ADT design. A statistically optimal ADT design is a design that minimizes one of the four optimality criteria while the 95% margin of error (MOE) of the reliability estimate is smaller than a pre-determined acceptable value. An economically optimal ADT design is one with the minimum expected total cost (ETC). In the first stage, the paper shows that the optimal SSADT and PCSADT are reduced, respectively, to two-step SSADT and two-group PCSADT using only the maximum and minimum available stress levels. The optimal allocation of the overall screening time to these two outlying stress levels are obtained. In the second stage, the paper considers the economical aspect of an ADT design, via the expected total cost, why controlling the MOE of the design. Explicit solutions are obtained.

Outlier detection based on Range Distribution

Format: CPS Abstract

Author: PROF. DR. Hana Sulieman

Co-Authors:

  • Dania Dallah
  • Ayman Alzaatreh

Detecting outliers is an important problem that has been studied in various research and application areas. Many statistical procedures are affected by the presence of outliers. Various outlier detection techniques are proposed in the literature in order to efficiently capture these anomalous observations. In this presentation, we will explore the use of range statistic in identifying outliers in univariate data. In particular, the relative range defined by the range statistic standardized by the interquartile range will be examined as a tool to detect outliers. A full empirical study will be conducted to compare the outlier detection performance of the relative range and the range statistic standardized by the standard deviation.

POLITICAL FAVOR, DEVELOPMENT PROJECTS, AND HOUSEHOLD WELLBEING

Format: CPS Abstract

Author: Mrs Nirosha Wijesekara Dissanayaka

The paper examines the impact of a large-scale development program on household well-being in the Hambantota District in the Southern Province of Sri Lanka. The name of the program is the Greater Hambantota Development Program (GHDP), which includes international level constructions such as a port, an airport, a stadium, and a massive administrative complex. The government obtained a huge amount of money from China for the construction of the project. Introducing such a massive development program to Hambantota was one of the biggest promises one candidate made during the presidential election in 2005. Project outcomes and the political motivation of the program is still debatable. However, whether the project is successful or not, the job creation and the cash flow circulating in the area can, directly and indirectly, impact domestic well-being, which is the focus of this study. The diff-in-diff method was employed to investigate the impacts. The findings show that the income (earnings from wages, agricultural activities, and non-agricultural activities) of the people who live in the Hambantota districts is lower compared to the income of the people who live in the non-treated district after the program was implemented, relative to the before intervention. Simultaneously, the spending of people on food and non-food items have also been lower in the households of the Hambantota district compared to their counterparts. The time it needs to spend by people who live in the Hambantota district to reach the public places is higher than the people who live in the non-treated district after the new city plan and road network introduced, vis-a-vis the before period. Furthermore, irregular development projects carried out have increased the vulnerability of the people of the area to natural disasters and disasters due to wild animals. Introducing large-scale projects suitable for a luxurious lifestyle sometimes may not meet the needs of the poor. The GHDP would be a good example of such a situation. Therefore, care should be taken when planning projects to uplift the living standards of the people living in such areas where more than 40% of the population depends on agriculture for their livelihood. Today, Sri Lanka is experiencing the consequences of politicians not listening to the views and advice of experts in the field when making their decisions. It is important to have an accurate estimate of the expected returns on loans before investing. Developing large-scale infrastructure by borrowing at high-interest rates without proper planning or study is very risky. Therefore, policymakers need to prepare policies that are required to prevent such situations.  Project failure is common in most developing countries. Many projects they implement to uplift the household's well-being. Unfortunately, a considerable number of projects fail. The biggest issue occurs when the money spent on the projects is borrowed at high-interest rates. The case of Sri Lanka would be a good example for them to think more before investing in massive projects after borrowing a large amount of money.

PREDICTING THE DETERMINANTS OF POVERTY IN ANAMBRA STATE, NIGERIA

Format: CPS Abstract

Author: Charles Aronu

Co-Authors:

  • Okafor Emeka Sixtus

Recently, the Federal Government of Nigeria released the 2022 Multidimensional Poverty Index (MPI) report in a bid to reduce extreme poverty. It is a way of demonstrating a commitment to the first goal of the Sustainable Development Goals, which is eradicating poverty in all its dimensions. Therefore, this study examines the determinants of poverty in Anambra State, Nigeria. The objectives of the study were to ascertain the percentage of the poorest in the State, and determine the factors that impact on the poverty rate in the state. The study employed a well-designed questionnaire to obtain data in all the 188 communities under the 21 local government area of the state. The statistical tools used in the study were the Random Forest analysis. The Random Forest classification analysis was employed to predict the Poverty Status of the respondents. The respondent variable for the study was Poverty Status (POVMI) which was obtained using the International Wealth index (IWI), while the explanatory variables considered for the prediction of the response variable were Age Interval (Age), Satisfaction status of households living in Anambra State (SLR), Perception of respondents on poverty rate over the last 8 years (PPOVT8), Choice of health facility used by household when sick (CHF), type of fuel does your household used for cooking (SFCl), and Highest educational qualification of respondents (HEQ). The findings of the study revealed that 6% of households are poorest while majority of the poorest resides in the rural area (96%). It was found that the highest educational qualification of the majority of the poorest was primary education (47.3%). Further findings showed that importance of the explanatory variables was in the following order of magnitude; type of fuel does your household used for cooking, Satisfaction status of households living in Anambra State, Choice of health facility used by household when sick, Age, Perception of respondents on poverty rate over the last 8 years, and Highest educational qualification of respondents. Hence, the type of fuel does your household used for cooking has the most impact on the Poverty Status while the Highest educational qualification of respondents recorded the least impact. This result implies that the poorest in the state do not use clean energy for cooking.

Penalized Mixture Cure Models for Modeling a Time-to-Event Outcome with Long-term Survivors in a High-Dimensional Covariate Space

Format: CPS Abstract

Author: Dr Kellie J. Archer

Co-Authors:

  • Han Fu

Treatment decisions for patients diagnosed with acute myeloid leukemia (AML) are often based on cytogenetics and selected genetic mutations. However, approximately 40% of AML patients are cytogenetically normal (CN). While the European LeukemiaNet (ELN) prognostic risk classification additionally refines this group of patients into favorable, intermediate, and adverse risk groups, some cytogenetically normal patients will enjoy long-term relapse-free survival despite their ELN classification. In such cases where an important subset of patients will not experience the event of interest, assumptions of the Cox proportional hazards (PH) model are violated. Thus, mixture cure models (MCMs) are an appropriate alternative to the Cox PH model when an important cured fraction exists. Specifically, MCMs assume the population consists of two subgroups, those cured and those susceptible to the event of interest, thus there are two regression components, which permit identification of features associated with cure and/or latency of susceptible patients. Because for CN-AML we are interested in relapse-free survival here ‘cured’ is synonymous with attaining long-term relapse-free survival and thus patients have a survival probability of 1. However, novel methods are needed to fit multivariable MCMs when the number of covariates exceeds the sample size, such as in the case of having high-throughput genomic assay data comprising the covariate space. Therefore, to identify prognostically relevant transcripts from high-throughput genomic assays and a multivariable model that can distinguish patients cured from patients susceptible with lower- or higher-risk of relapse we developed parametric and semi-parametric regularized mixture cure models (MCM) that embed false discovery rate control. We examined the performance of our regularized MCMs using extensive simulation studies and compared them to regularized Cox PH model, regularized Weibull model, and to two existing MCM approaches: Cmix and sign consistency in cure rate models (SCinCRM). We then applied our regularized MCMs to a CN-AML dataset. First, we fit univariable MCMs to identify baseline demographic, clinical features, or selected gene mutations related to the probability of being cured and/or to the latency distribution (time to relapse). We then included gene expression values as candidate covariates in our novel regularized MCM to identify a parsimonious list of transcripts associated with cure or latency. An independent CN-AML dataset was used to validate the transcripts identified by our model. Our regularized MCM identified transcripts associated with cure and latency. Kaplan-Meier curves of cured versus susceptible patients as well as of those susceptible with lower vs higher risk of relapse or death were well separated. In conclusion, our regularized MCMs identified important subsets of genes associated with cure and latency in CN-AML patients. Our results suggest that this group includes distinct transcriptionally defined subgroups with different biological properties, which may be useful for refining current risk stratification systems and indicate who might be cured with chemotherapy alone versus referred for more aggressive therapies.

Perception of Undergraduate Students on Online Learning of Statistics Courses

Format: CPS Abstract

Author: Miss Ayobami Fadilat Gboyega

Co-Authors:

  • Ayobami Fadilat Gboyega
  • Dorcas Modupe Okewole
  • Oluyemi Adewole Okunlola

Online learning is an essential method of teaching and imparting knowledge to students in this era of modern technologies and it has come to stay. Online learning turns out to be student-centered, where they take part fully in the learning process, and teachers only supervise and guide them. However, their perception on this learning approach is very key to the effectiveness of the whole process. The targeted populations were the students taking statistics courses which include both the major and non-major students at a Nigerian University.

Performance Metrics for Sample Selection Bias Correction

Format: CPS Abstract

Author: Ms An-Chiao Liu

Co-Authors:

  • An-Chiao Liu
  • Ton De Waal
  • Sander Scholtus
  • Katrijn Van Deun

When estimating a population parameter by a non-probability sample, a sample without a known sampling mechanism, the estimate may suffer from sample selection bias. To correct selection bias, one of the often-used methods is assigning a set of unit pseudo-weights to the non-probability sample, and estimating the target parameter by the weighted sum. However, a tailor-made framework to evaluate the assigned weights is missing in the literature, and the evaluation framework for prediction problems may not be suitable for population parameter estimation. We try to fill in the gap by discussing several promising performance metrics, which are inspired by classical calibration and measures of selection bias. A simulation study and real data examples show that some performance metrics have a strong positive relationship with the mean squared error of the estimated population mean. These performance metrics may be helpful for model selection when correcting selection bias by logistic regression or machine learning algorithms.

Performance of Bayesian Priors in Validation of Correlates of Protection for High Efficacy Vaccine Trials

Format: CPS Abstract

Author: Prof. Edith Umeh

Although the use of intermediate clinical endpoint or surrogate (correlate) of protection (CoP) has increased over the years, the validation of CoP for high efficacy vaccine trials has remained a challenge due to sparse data and conventional statistical methods which are not adequate. Be it in the frequentist or the Bayesian world, the meta-analytic approach is a well-accepted method of validation. However, the full joint bivariate models suffer computational issues. And there is a push for the use of individual level instead of aggregate data in validation process. In this quest, the Bayesian approach is emerging as the future as regards the validation of CoP but one recurring criticism about this method is its application of prior distributions. To elucidate which makes better sense, the non-informative (NIP) and weakly informative prior (WIP) distributions are compared in a meta-analytic approach using simulated data. It was found that, 1) there are no convergence issues when either of the models are used, 2) WIP models take about 20% longer time than NIP models to converge, and 3) the NIP models consistently perform better than the WIP models.

Periodic data and changepoints: New methodology inspired by digital health applications

Format: CPS Abstract

Author: Mr Owen Li

Co-Authors:

  • Rebecca Killick
  • Ben Norwood

Changepoint detection is the study of identifying points in time where the underlying model of the data changes, e.g. changes in mean. These points in time are called changepoints. Traditional changepoint approaches take the time axis to be linear where changepoints to occur linearly in time; one changepoint happens after another and are independent of each other. Time may move forward linearly but there exists fixed periods, e.g. days or years, which are repeated, resulting in periodic, repeated behaviour and regularly occurring changepoints.Applying traditional linear time search methods on periodic data processes results in suboptimal solutions and existing circular time approaches are either stochastic periods or local periodicity. Alternative approaches including artificially creating multivariate time series from a single time series with a periodic structure leads to destroying the time dynamics and creating arbitrary start and end points.In this paper, we propose a computationally efficient changepoint method which treats the time axis as circular. This means we utilise the fact that behaviours within the fixed period are repeated across linear time, so we focus on a single fixed period and assume the start and end of the period come from the same segment. We then extend our periodic changepoint search method to data processes which exhibit periodic behaviour that also changes across time. We model these data processes as having a periodic-global structure. We show both methods perform well on digital health applications.

Pitman-Yor mixtures for BART: Novel nonparametric prior for Bayesian causal inference

Format: CPS Abstract

Author: Dr Andrej Srakar

We develop a novel regularization prior for Bayesian additive regression trees (BART) based on Pitman-Yor Mixture process which improves on earlier BART priors and Hahn et al.'s Bayesian causal forests model. In particular, the model is able to address estimation of heterogeneous effects in the presence of strong confounding. Computational issues are addressed using importance sampling with integrated nested Laplace approximation. We discuss extensions to endogeneity corrections.

Planning and Prediction: Modelling Self-Response to the Canadian Census of Population via Survival Analysis with Competing Risks

Format: CPS Abstract

Author: Mr Craig Hilborn

Co-Authors:

  • Craig Hilborn
  • Kenza Sallier

For the past two censuses of the Canadian population (2016 and 2021), Statistics Canada used a microsimulation model during the data collection period to dynamically forecast the end-of collection response rates and non-response follow-up costs. The goal of these weekly forecasts was to evaluate proposed collection strategies and ensure the judicious use of resources. One of the critical components of this microsimulation model is the self-response (SR) process, which simulates SR, defined as a dwelling submitting a census questionnaire or a request for the means to complete a questionnaire, without intervention from an agent of Statistics Canada. This presentation will describe the competing risks survival analysis procedures used to generate the requisite SR parameters used in the microsimulation model, as well as their development and validation.

Polymer design by interplay of machine learning, computer simulation and expert knowledge

Format: CPS Abstract

Author: Dr Stephen Wu

Co-Authors:

  • Stephen Wu

There has been rapidly growing demand of polymeric materials coming from different aspects of modern life because of the highly diverse physical and chemical properties of polymers. Polymer informatics is an interdisciplinary research field of polymer science, computer science, information science and machine learning that serves as a platform to exploit existing polymer data for efficient design of functional polymers. In this study, we present two cases studies of successful discovery of new functional polymers using a data-driven approach. The design process involves an interplay between expert knowledge, computer simulation, and machine learning. The proposed method is shown to be superior than the conventional design approach.

Predicting Indonesia’s Exports at the Sub-national Level using Nighttime-Light (NTL) Data

Format: CPS Abstract

Author: Ms Realita Eschachasthi

Co-Authors:

  • Purwaningsih Purwaningsih
  • Agung Andiojaya
  • Riana Rizka
  • Titi Kanti Lestari

Abstract: Export statistics are one of the primary indicators of economic performance. In the digitalization era, the demand for real-time data published by official statistics, including export data, has escalated quickly. This paper explores nighttime-light (NTL) data as a proxy and as a tool to calibrate the export value of Indonesia at the sub-national level. The analysis uses a panel statistical regression from 2019 to 2021 to measure the value of exports by the province of origin and by the province of port loading.

Predicting NEET status for the Moroccan young men and women with Random Forest and C50 classifiers.

Format: CPS Abstract

Author: Miss Salima MANSOURI

Classification of NEET status of Moroccan youth, both men and women, using random forest and C50 classifiers. Dealing with missing values using MICE method for imputation (Multivariate Imputation with Certain Estimations).Balancing the data using under sampling and SMOTE_NC (Synthetic Minority Over-sampling Technique-Nominal Continuous) for C50.

Predicting pollution risk using asymmetric GARCH-DCC models

Format: CPS Abstract

Author: Prof. Giuliana Passamani

Co-Authors:

  • Paola Masotti

Multivariate conditional volatility models can describe, in a parsimonious way, time-varying conditional covariances and correlations of pollutants time series, thus enhancing prediction of pollution risk.

Predicting precipitation using transfer function models of the spatiotemporal variabilities in the arid and sub-humid regions of Southern Africa

Format: CPS Abstract

Author: Mr Lyson Chaka

Co-Authors:

  • Mohamed A. M. Abd Elbasit

The impact of global warming on coastal regions of Southern Africa has contributed to a series of unusual rainfall patterns and floods in the past decade. The region is experiencing severe damage and loss to infrastructure and productive landscape. Currently, the factors associated with these adverse rainfall patterns, and the relationships among these factors are not well known. The spatial and temporal variability in monthly rainfall in the arid, semi-arid and sub-humid areas in South Africa are analysed to generate a mechanism for the prediction of precipitation in the regions. Transfer function models are appropriate tools to model precipitation patterns that are assumed to be influenced by other spatiotemporal variations in climatic conditions. These models provide a basis for explaining the relationships between precipitation patterns and other climatic factors for a specified period. Precipitation predictions are useful for planning of agricultural activities, making decisions in farming projects and disaster management purposes in the region. Decision-making in agriculture and other sectors of economy rely mostly on weather patterns and forecasts. We propose a time-series modelling approach that uses dynamic linear regression models to forecast monthly average precipitation in the Southern Africa regions based on shortwave radiation, sea surface temperature anomalies and air temperature. We demonstrate the concept using the 1983-2021 data collected from multi-satellites located in the Indian and Atlantic Oceans bordering the target regions.

Prediction of Internet Users in Indonesia Using Google Trends Data

Format: CPS Abstract

Author: Ms Atika Nashirah Hasyyati

Co-Authors:

  • Atika Nashirah Hasyyati
  • Dwiyana Siti Meilany Dalimunthe
  • Iwan Fathi Fauzan
  • Richard Leyshon
  • Filippo Cavallari

Statistics Indonesia produces the percentage of internet users in Indonesia based on the Indonesian National Socioeconomic Survey (Susenas) annually. One of the impacts of the COVID-19 pandemic is limited access to respondents, so The Partnership on Measuring Information and Communication Technology for Development reported the need to promote data innovations as the complement of ICT traditional data. Internet use is one of the most important ICT variables that can describe the gaps in connectivity and technological advancement. Meanwhile, the current number of internet users can only be published annually so there is a need for more timely estimates. Google Trends data is free to access and timely available (nowcasting) that can be used to produce data and prediction. This paper aims at using Google Trends as an alternative data source to produce monthly estimates of the percentage of individuals using the internet in Indonesia by province. In this case, we also utilise the official data (National Socioeconomic Survey) of the percentage of internet users in Indonesia by province from 2014 to 2021. Using some variables as predictors based on Google Trends data (based on Web Search, possible search keywords are Facebook, WhatsApp, etc., including top searches). Then, applying machine learning methods to predict the percentage of internet users. Several challenges need to be overcome when conducting data gathering from the Google Trends API. Over 1.6 million data was collected by using gtrendsR package. After data cleaning, some machine learning methods were compared to produce monthly estimates of internet users. Comparison of some machine learning methods show that XGBoost has the highest accuracy.

Predictors of depression and anxiety among urban adults during COVID-19: An online cross-sectional study in Dhaka city, Bangladesh

Format: CPS Abstract

Author: Dr MD SHAHJAHAN

Co-Authors:

  • Md. Mazharul Islam

Application of statistics in Health Science

Probabilistic Vector Machines

Format: CPS Abstract

Author: Prof. Antonio Pedro Duarte Silva

In supervised classification problems, Support Vector Machines (SVMs) are known to be among the most accurate class predictors. However, standard SVMs fail to complement these predictions with reliable estimates of class membership probabilities. Here, it will be presented a novel algorithm to estimate class probabilities from sequences of weighted SVMs. Numerical experiments show that this algorithm, scales better than existing alternatives, is more accurate than competing machine learning approaches, and more robust than model based statistical methodologies

Probabilistic forecast reconciliation for emergency services demand

Format: CPS Abstract

Author: Prof. Rob Hyndman

Co-Authors:

  • Bahman Rostami-Tabar

We use forecast reconciliation methods to produce probabilistic forecasts of emergency services demand for the Welsh Ambulance Service NHS Trust. Forecasts are produced for national, regional and sub-regional levels, and for different types of incidents and different degrees of priority.

Processing survey data with VTL

Format: CPS Abstract

Author: Mr Thomas Dubois

After renovating its data collection system for households surveys based on the concept of active metadata, INSEE has pursued technical investments for the post-collection processing. The Validation and Transformation Language (VTL) proposed by the SDMX initiative is used for reconciling data collected from different modes and start the first processing. VTL processing rules are used and interpreted thanks to Java and JavaScript implementations provided by the Trevas Open Source tool.

Producing the Quarterly Gross Domestic Product of Abu Dhabi Emirate – Two Approaches

Format: CPS Abstract

Author: Ms Wadeema Mohamed Alkhoori

The QGDP is one of the most critical short-term economic indicators. It is compiled using a combination of administrative and survey data from the Quarterly Economic Survey (QES). In 2021 the QES was expanded from roughly 120 to 1,000 establishments with the aim of improving the consistency between the pre-benchmarked QGDP and the annual estimates. The QGDP methodology changed from processing 27 activities individually, several of them on an ISIC two-digit or sub-activities level, to processing 17 activities at the 1-digit level without sub-activities based on the new methodology. This paper provides an in-depth overview of both approaches.

Prof., Dr. Ruslan Motoryn EVALUATION OF THE IMPACT OF THE WAR IN UKRAINE ON THE DOMESTIC FINANCIAL SECTOR

Format: CPS Abstract

Author: Prof. Ruslan Motoryn

Co-Authors:

  • Tetiana Motoryna, Kateryna Prykhodko

The article examines the impact of the war on the financial sector of Ukraine, systematizes the mechanisms of manifestation of this impact. The consequences of Russia's attack on Ukraine have affected the entire world economy and Europe in particular. The features of the impact of the war on the economy in domestic conditions are revealed. The continuation of the war exacerbates most of the risks. The largest one, credit risk, is already being realized and the losses from it will grow in the future. Financial institutions are gradually recognizing credit losses and reflecting the impact of negative events on asset quality. The National Bank of Ukraine restored the requirements for credit risk assessment and, in particular, for calculating the number of days past due on loans. Limited demand for loans, especially from households, worsening portfolio quality, and higher provisioning increases profitability risks. The dynamics of the ratio of the volume of loans, bank deposits and other indicators of the financial sector of Ukraine was analyzed. The role of bank lending in combating the consequences of military aggression is revealed, and measures aimed at mitigating its negative impact on the economy are substantiated. An analysis of the dynamics of indicators of non-banking financial institutions was also made. They have just begun to recover from the negative impact of the pandemic, they have felt all the risks of war. Unlike banks, part of the market did not cope with operational risks: financial institutions stopped working, processes were disrupted and information was lost. Currently, only about two-thirds of the sector submits financial statements. The volume of transactions of non-banking financial institutions has significantly decreased. Demand for insurance and lending has fallen, the quality of the loan portfolio of credit unions and financial companies is deteriorating. Measures aimed at improving the state of the financial sector of Ukraine are substantiated. Keywords: financial sector, war, banking, loans, deposits, non-bank financial institutions.

Promoting The MED-HIMS Experience to Improve Data on Refugees

Format: CPS Abstract

Author: Ms sohair ahmed

This paper presents The Use of Household Surveys to gather Data on Refugees the ‘Mediterranean Household International Migration Survey’ (MED-HIMS). Experience, using data collected in the 2013 Egypt Household International Migration Survey (Egypt-HIMS). The survey was implemented by the Central Agency of Public Mobilization and Statistics (CAPMAS) as part the MED-HIMS programmer which is a joint initiative of the European Commission/Eurostat, the World Bank, UNHCR, UNFPA, ILO, IOM, and the League of Arab States. Following a brief description of the design of the Egypt-HIMS, the paper provides a demographic and socioeconomic profile of the survey households and population according to the migration status of the household. Four types of households are considered: households with current migrants, households with return migrants, households with non-migrants, and households with forced migrants. Egypt-HIMS Targeted Survey These 1,692 households included 6,813 persons, who were interviewed with the ‘Household Questionnaire’. Of the household population 4,309 persons (63.4%) were 15 years of age or more; – Of whom, 1,793 forced migrants were randomly selected and successfully interviewed with the ‘Individual Questionnaire for Forced Migrant’ The ‘Household Socio-economic Characteristics Questionnaire’ was also administered to each of the 1,692 households in the final sample The analysis highlights what characteristics, and with what impacts. Main findings and key indicators are presented on the following aspects of forced migration and the causes, consequences and experiences of forced migrants. In Egypt, forced migrants do not live in camps; there are no camps in Egypt, and forced migrants settle in dwellings., many forced migrants have “self-settled”, that is intermingled with local people who may be assisting them. Key words: International migration- determinants of forced migration - Egypt-HIMS.

Pursuing Indonesia’s 2030 Economic Goals and Sustainable Manufacturing: Impact of Industrial Revolution 4.0

Format: CPS Abstract

Author: Ms Ayu Paramudita

Manufacturing forms the largest sector of Indonesia’s economy. Due to its strategic role in the economy, the national initiative of Making Indonesia 4.0 had been launched in 2018 focusing on industrial transformation in technology, research, innovation, and sustainability. From the business perspective, the increasing costs in materials, energy, and high expectations of customers or investors have brought the advanced adaptation in hi-tech and green business as critical aspects to increase efficiency and competitiveness. However, how is the impact of industrial revolution 4.0 (4IR) in Indonesia, particularly in large- and medium-scale manufacturing sector? This study focuses on the effects of innovation and R&amp;D on the economy, productivity, and sustainable manufacturing, examining Indonesia's aspiration to be a global top 10 economy by 2030.

Quality Aspects of Web Data based on the Experiences of the ESSnet Trusted Smart Statistics – Web Intelligence Network

Format: CPS Abstract

Author: Dr Magdalena Six

Co-Authors:

  • Alexander Kowarik

Web data for the production of official statistics

Quality Management in Statistics Portugal – news challenges in a journey of a life time

Format: CPS Abstract

Author: Mrs Maria Zilhão Zilhão

Co-Authors:

  • Magda Ribeiro
  • Sofia Rodrigues

Statistics Portugal (SP) is the National central statistics Authority of the NSS. Together with other national authorities it strives in a spirit of cooperation and community to grow and improve processes and products. Quality management has been a journey of many steps that made SP stronger and trustworthy over time. The paper focus on the main strengths of this journey from the quality angle. But also on the challenges posed to the new roles of NSOs within the emerging data ecosystem, leading to a path of improvement, assessment and innovation.

Quality challenges in production of official statistics in decentralized systems

Format: CPS Abstract

Author: Mr Saeed Fayyaz

This paper is going to deal with quality challenge of official statistics. using administrative data for production of official statistics includes quality matters too. session of official statistics and quality assurances is suitable session for this paper.

Quantifying the contribution of individual records to the reidentification risk of (pseudo)anonymized datasets

Format: CPS Abstract

Author: Dr Fotios Stavropoulos

Co-Authors:

  • Michel Béra
  • Vasiliki Daskalaki
  • Kimon Spiliopoulos
  • Konstantinos Spinakis
  • Gilbert Saporta

A measure of the risk of reidentifying individuals in a dataset and a method to estimate it have been proposed recently. This paper presents various approaches to estimating the contribution of each record to the dataset’s risk. This serves various purposes. The withdrawal of (possibly a few) records with large contribution may reduce the dataset’s risk. Furthermore, the contribution of a record can be considered as a proxy of the risk of identifying the individual the record corresponds to.

RESPONSE MODEL SELECTION IN SMALL AREA ESTIMATION IN CASE OF NOT MISSING AT RANDOM NONRESPONSE

Format: CPS Abstract

Author: Dr Michael Sverchkov

Co-Authors:

  • Danny Pfeffermann

Sverchkov and Pfeffermann (S-P, 2018, 2019) consider Small Area Estimation under informative probability sampling of areas and within the sampled areas, and not missing at random (NMAR) nonresponse. To account for the nonresponse, S-P assume a given response model and estimate the corresponding response probabilities by application of the Missing Information Principle, which consists of defining the likelihood as if there was complete response and then integrating out the unobserved outcomes from the likelihood employing the relationship between the sample and sample-complement distributions. A key condition for the success of this approach is the specification of the response model. In this presentation we consider likelihood ratio tests and information criteria based on the above likelihood and show how they can be used for the selection of the response model. We illustrate the approach by a small simulation study. REFERENCES Pfeffermann, D. and Sverchkov, M. (2019). Multivariate small area estimation under nonignorable nonresponse, Statistical Theory and Related Fields, 3, pp. 213-223 Sverchkov, M. and Pfeffermann, D. (2018). Small area estimation under informative sampling and not missing at random non-response. Journal of Royal Statistical Society, ser. A, 181, Part 4, pp. 981–1008.

Receipt Embedding and Shopping Purpose Segmentation

Format: CPS Abstract

Author: Yinxing Li

Co-Authors:

  • Nobuhiko Terui

Marketing data are expanding in several modes nowadays, as the number of variables explaining customer behavior has greatly increased, and the automated data collection in the store has also led to the recording of customer choice decisions which generate large scale samples. Thus, high-dimensional models have recently gained considerable importance in several areas, including marketing. Some distributed representation models based on product embedding such as Prod2Vec, for instance by Ruiz, Athey and Blei (2019), involve various marketing variables such as price and customer demographic data, but the role of these variables in forecasting and marketing decisions have never been well discussed. Our study not only aims to propose a model with better forecasting precision but also to reveal how firm’s marketing and customer demographics affect customer behavior and uncover the shopping purpose hidden in the receipt by extending product embedding approach.

Redesigning of the Commercial Livestock and Poultry Survey in the Philippines

Format: CPS Abstract

Author: Ms Quindale Caraos

Co-Authors:

  • Nikkin Beronilla
  • Willreign Zoren dela Cruz

The Commercial Livestock and Poultry Survey (CLPS) of the Philippine Statistics Authority (PSA) is a major survey conducted by the Livestock and Poultry Statistics Division (LPSD) to measure the performance of livestock and poultry industry. Specifically, the survey aims to generate primary data on supply and disposition of animals from commercial farms. The new sampling design of CLPS provides the accepted reliability and accuracy measures of the estimates computed, and implementation considerations at the provincial level. The estimates computed in the pilot survey were the total ending inventory, total production (in MT), and total egg production of livestock and poultry animals.

Redesigning of the Quarterly Municipal Fisheries Survey in the Philippines

Format: CPS Abstract

Author: Ms Quindale Caraos

Co-Authors:

  • Erniel Barrios
  • Nikkin Beronilla
  • Quindale Caraos

The Quarterly Municipal Fisheries Survey (QMFS) goal is to determine the total volume and average value of fish unloaded at municipal fish landing facilities each quarter. In the current sampling design, QMFS uses stratified random sampling of traditional municipal landing centers. Furthermore, rather than the actual catch of the day, the present QMFS sampling methodology employs data based on the key informant's recall for a monthly catch in a particular quarter, which could contribute to recall bias. With this, a survey redesign is required in order to construct a sampling design that reliably estimates the total volume and average value of municipal fisheries production.

Relationship in Food safety knowledge, attitude and practice of kitchen workers of the Addis Ababa School Feeding Program: Application of CFA and SEM

Format: CPS Abstract

Author: Dr Zelalem Destaw

Co-Authors:

  • Samuel Kidane

A large population of school children benefit from the school feeding programs initiated by the Addis Ababa Administration. Unsafe food can cause outbreaks of foodborne diseases which may, in turn, deplete nutrition in the long run or lead to death. Proper food safety knowledge, attitude, and practices (KAP) of the kitchen workers is, therefore, essential, at which this research was aimed to investigate.

Remittances of Egyptian Migrants in the context of sustainable development goals and COVID-19 Crisis: Challenges and Opportunities

Format: CPS Abstract

Author: Dr Waleed Mohamed

In September 2015. The United Nations launched the sustainable development goals (SDGs). Egypt was one of the 193 countries that adopted the SDGs and ratified the related agreements, and starting from 1 January 2016, the 2030 Agenda for Sustainable Development that included the 17 Goals, 169targets,and 231indicators. Unlike the Millennium Development Goals (MDGs), the Sustainable Development Goals (SDGs) framework- especially goal 10-explicitly recognizes the role of international migration in the context of achieving a more just and equitable world; also we can conclude that international migration and remittances have implications for achieving a range of goals such as SDG 1, no poverty; SDG 2 Zero Hunger; SDG 3 Health and Well-Being; SDG 4 Education; or SDG 8 Decent work and economic development. Remittances in its simplest definitions are money sent back home by migrants, typically representing a share of their earnings in the host country, and this remittances is sourced from balance of payments statistics (which record financial transactions between a country and the rest of the world); so remittances has been considered an important and growing source of foreign funds for several developing countries and achieving the Sustainable Development; So SDG 10.c target commits, by 2030, to reduce to less than 3 percent the transaction costs of migrant remittances and eliminate remittance corridors with costs higher than 5 percent;and by achieving that, remittance families would save an additional US$20 billion annually. Therefore, when the Corona pandemic spread in the world in late 2019, remittances were affected; and Several World Bank reports indicated that COVID-19 not only affected remittances in terms of volume, but could also have an impact on the costs of remitting money along different corridors, which were already significantly different. Moreover, some of the largest remittance-sending countries - such as United States, Switzerland, Germany, France and Italy - are trapped by the COVID-19 pandemic, and service sector jobs have been hit hard from the outset by the health crisis. Migrants working in hotels, restaurants and salons have lost their jobs. Based on the above; this paper aims to discuss monitoring remittances of Egyptian migrants in the context of SDGs and the COVID-19 crisis, The size of the flows of Egyptian immigrants abroad ,The volume of remittances as a percentage of GDP, The countries most receiving remittances in sub-Saharan Africa, the Middle East and North Africa also at the level globally Focusing on Egypt also the current situation of migration indicators in the context of monitoring Egypt's SDGs, to determine challenges and opportunities.

Remote working and new forms of work: evidence from INAPP-PLUS (Participation, Labour and Unemployment Survey)

Format: CPS Abstract

Author: Dr Paolo Emilio Cardone

Background: When the pandemic hit in the spring of 2020, many private companies and public administrations had to resort to working-from-home arrangements for their employees. While remote working was rather uncommon before the pandemic, this became the prevalent work arrangement for a large fraction of the working population. This shift did not take place homogeneously: the extent to which each firm adopted this strategy depends on the type of industry. Objective: Aim of the analysis is to investigate the workers transition to teleworking evaluating the impact of demographic and jobs’ characteristics on the probability of having worked from home, partially or totally, during both waves of the pandemic. Methods: The data used in this article are from the last Ninth Survey on Labour Participation and Unemployment (PLUS), thus a sample survey on the Italian labour market supply developed and administered by the National Institute for the Analysis of Public Policies (INAPP). This survey contains information about social categories, income, education and employment conditions for 50,000 individuals between 18 and 74 years old. The primary objective of the INAPP-PLUS survey is to provide reliable statistics on phenomena rarely or marginally explored by other surveys on the Italian labour market. In fact, although the Labour Force Survey by the National Institute of Statistics (i.e. ISTAT) provides aggregates and official indicators on the labour market, the INAPP-PLUS survey is mainly aimed to give some insights on specific, particularly problematic aspects, such as the remote working. Using logistic regression model it is possible to estimate the different attitudes among workers more accurately. Conclusion: The research is placed in a constantly changing scenario, which implicates the extension of smart working to different work contexts. Main findings show that homeworking is positively related to the level of information technology skills: this result calls for more investment in IT infrastructure as well as for training of adult workers. Working from home also largely depends on the features of the job, even controlling for many other covariates at the individual level.

Reparameterized count and semi-continuous regression models

Format: CPS Abstract

Author: DRS Marcelo Bourguignon Pereira

Co-Authors:

  • Marcelo Bourguignon Pereira

Regression models are typically constructed to model the mean and dispersion/precision of a distribution. However, the density or probability mass function of several distributions is not indexed by the mean and dispersion/precision parameters. In this context, this work provides a collection of regression models considering new parameterizations in terms of the mean and dispersion/precision parameters. The main advantage of our new parametrizations is the straightforward interpretation of the regression coefficients in terms of the expectation and dispersion, as usual in the context of generalized linear models. The maximum likelihood method is used to estimate the model parameters.

Review of the NEET (neither in employment nor in education or training) rate

Format: CPS Abstract

Author: Mrs Zineb El Ouazzani Touhami

Young persons not engaged in education, employment or training, expressed as the acronym “NEET”, are being used increasingly as a measure of youth marginalisation and disengagement. NEETs are of particular interest to policy-maker. It was included as one of the indicators proposed to measure progress towards the achievement of the Sustainable Development Goals (SDG): Reduce the number of young people who are “Not in Employment, Education or Training” (NEET). The proposed indicator addresses youth inactivity and exclusion in a meaningful way,looks beyond the narrow lens of unemployment. However, The Neet rate is the tree that conceals the forest, due to the heterogeneity within the NEET group. the acronym hides subcategories, which represent different realities. We consider these four categories: The Job seekers are the short- or long-term unemployed who are actively looking for a job. -The Unavailable are not actively looking for a job because they simply can’t, due to their family duties or responsibilities, -The Discouraged young This group is characterized by demotivation and passivity in terms of job search. -The NEET by choice, from wealthy families with a strong social background and strong human capital. -The NEET ith health problems or those who have disabling health problems. From what precedes, we clearly see that there is a group who are unwilling to join the labour market. this category inflates the number of NEETS and misleads policy-makers as most of them can presumably be considered as facing difficulties in finding a job. According to the Moroccan labour force survey ,In 2021, more than one in four young people aged 15 to 24 (26.0%, or 1.5 million) are not working, not in school and not in training. This rate is 38.8% for women against 13.6% for men. Women make up the majority among NEETs: 73.4% of NEET are women (i.e. 1.1 million), Almost three-quarters of NEETs (73.0%, or 1.1 million) are in situations of inactivity other than studies or training, which means that they do not work not, that they are not looking for work and that they are not about to start work. Young NEET women are more exposed to economic inactivity than young NEET women. NEET young men. The proportion of inactive women among women in a NEET situation stands at 88.5%, compared to 30.1% for NEET men. Moreover, inactive women aged 15 and over are asked if they would like to work if the opportunity arises, it appears that 90.6% among young inactive NEET women do not wish to work. The main reasons given are the education of the children and care work for household (51.4%), lack of interest in work (22.4%), do not wish to work.. For this sub-group even if the opportunity arises to join the active life, they will refuse because from their points of views , despite of not being engaged in any activity to produce goods or provide services for pay or profit, they are responsible for raising children , they are engaged in one of the most important roles a woman can ever play. Being present in their children’s lives, caring for them, loving them, teaching them, and so much more. with regard to this, the goal of reducing the Neet rate, especially for women aged 20 to 29, who are unwilling to participate in the labour market,over and above that gives more importance and priority to the education of children cannot be achieved,if it continues to be designed and conceived as it is. This paper is a call to revise the NEET rate , to add other criteria to this concept such as the willingness and availability of young people to engage in training or in the labour market Therefore, to help focus the efforts of policy makers in persons who have relative strength attachment to the labour market,who were available for work but not seeking work during the reference period,more particularly the discouraged, the pensioners and the infirm.labour Also, it is an attempt to propose another indicator not in employment, education training, nor in childcare,which is achievable, realistic, and reasonable .

Robust testing of paired outcomes in clustered data with informative cluster size

Format: CPS Abstract

Author: Dr Sandipan Dutta

Paired outcomes are common in clustered data where the main aim is to compare the distributions of the outcomes in a pair. In such clustered paired data, informative cluster sizes can occur when the number of pairs in a cluster is correlated to the paired outcomes or the paired differences. There have been some attempts to develop robust rank-based tests for comparing paired outcomes in such complex clustered data. Most of these existing rank tests developed for paired outcomes in clustered data compare the marginal distributions in a pair and ignore any covariate effect on the outcomes. However, when potentially important covariate data is available in observational studies, ignoring these covariate effects can result in a flawed inference. Therefore, there is need for developing a robust approach that can perform hypothesis testing of paired outcomes while adjusting for the effect of important covariates on the outcomes.

Role of project work in Statistics education

Format: CPS Abstract

Author: Dr Peter Kovacs

In the changing world, more and more data and the conscious use of these are required to understand social and economic phenomena. To improve students’ ability to use statistics correctly it is worth building project work to the assessment of their statistics knowledge. In business education at the bachelor level we combine project work, traditional exercise solving, and theoretical questions to asses students’ statistical knowledge. In the presentation, our experiences will be shown.

SMB-Gen: A Bayesian Emulator for Surface Mass Balance within a Coupled Climate Ice-Sheet Computer Model

Format: CPS Abstract

Author: Dr Jonathan Owen

Co-Authors:

  • Daniel Williamson
  • Lauren Gregoire

Coupled climate and ice-sheet computer models are used to study atmosphere-ice systems. Surface Mass Balance; an output of the computationally expensive climate model, is an incomplete spatial field with values only in grid cells where there is a positive surface ice fraction; another complex spatial field returned by the ice-sheet model. We develop Bayesian emulation methodology exploiting a latent Gaussian process model to mitigate these challenges and apply this to the FAMOUS-Glimmer model.

Scoping Study on SDG Goal 5: Achieve Gender Equality and Empower All Women and Girls in the Philippines

Format: CPS Abstract

Author: Ms Dannela Jann Galias

Co-Authors:

  • Ann D. Umadhay

More than a third of the timeline has passed since the launching of the SDGs and the remaining years to achieve the set goals are getting fewer. As a way to scale up the collective actions to achieve the set goals by 2030, representatives of the UN member states called for a Decade of Action to deliver the Global Goals in 2019. One conclusion was that with less than a decade remaining, one of the goals that need attention is women’s empowerment embodied in SDG Goal 5 [1]. The fifth of the 17 SDGs, SDG Goal 5 consists of nine targets and is measured by 14 indicators. It is focused on achieving gender equality and empowering all women and girls, with particular attention on ending gender disparities, eliminating gender-related violence and early and forced marriage, securing equal participation and opportunities for leadership and decision-making, and providing universal access to sexual, health, and reproductive rights.Progress in monitoring SDGs is gauged through the tier classification of indicators. As of April 2022, the global status report states that five of the 14 indicators of SDG 5 are classified as Tier I Indicators (have clear concepts, internationally established methodology and standards, and regularly produced data), while the remaining nine are classified as Tier II indicators (have clear concepts, internationally established methodology, and standards, but data is not regularly produced). In the Philippines, however, only a select few are tagged as Tier I; the others remain at Tier II or Tier III (no established methodology or methodologies are being developed or tested). Only nine of the 14 indicators are monitored under SDG Watch, the official internet-based platform for disseminating the Philippine SDG indicators for specific targets.Far more needs to be done to raise the tier classifications of the indicators and make available the regular collection and reporting of data. For this reason, this paper presents the scoping of six Tiers II and III SDG 5 indicators in the Philippines – SDGs 5.a.2, 5.a.1, 5.b.1, 5.3.2, 5.4.1, and 5.6.1 done through a review of relevant literature and in consultation with national focal agencies and stakeholders. It attempts to shed light on the status of gender welfare in the country in relation to the progress made to SDG Goal 5 in the Philippines, as well as to identify priority indicators for further methodological studies and the generation of estimates for national and global reporting.[1] United Nations. (2020, August 13). Decade of Action. United Nations Sustainable Development. https://www.un.org/sustainabledevelopment/decade-of-action/

Seasonal Zero Modified Geometric INAR(1) Process

Format: CPS Abstract

Author: Ms Aishwarya Ghodake

Co-Authors:

  • Manik Shankar Awale

Non negative Integer-valued auto regressive (INAR) models have been widely used for modelling the count time series data. These models have shown promising applicability in various fields such as health, insurance, and marketing etc. Number of daily/weekly cases of a disease, weekly number of insurance claims, number of items sold per day of a particular product are some of the examples of the count time series which can be modeled using INAR models. Many times such series can exhibit the seasonality. Various models have been proposed for the non seasonal count time series data, but very few have been proposed for the seasonal count time series data. We propose a INAR(1) process with seasonality for dealing with count time series with deflation or inflation of zeros. The proposed model is also capable of capturing under dispersion and over dispersion which sometimes are caused by deflation or inflation of zeros. We forecasting seasonal zero modified geometric integer valued autoregressive process of order 1 or ZMGINAR(1)s with Geometric marginal distribution. In the context of an over dispersed or under dispersed count time series data, we consider the seasonal ZMGINAR(1) and study the k-step ahead forecasting distribution corresponding to this process in detail using probability generating function. When an integer valued time series is over dispersed, Poisson time series model may not be a good choice. McKenzie (1986) proposed the INAR(1) process with geometric and negative binomial distribution as the marginals. When the count time series data has some large observation in the tail part, the geometric INAR(1) process and negative binomial INAR(1) process may be some suitable alternatives. Coherent forecasting, which is an integral part of count time series analysis, has got very little attention in the context of integer-valued time series analysis. Here, the coherent forecasting means forecasting values are to be integer. So far very few works on coherent forecasting have been done in the count time series context. Freeland and McCabe (2004) possibly be the first authors who used the concept of k-step ahead coherent forecasting of X n+k given the available data Xn , Xn−1, . . . , X1 of the time series process by using the median and mode of the k-step ahead forecasting distribution. Although the mean of a discrete distribution may not be an integer, median and mode are always so. Moreover, median has optimizing properties like it minimizes the expected absolute error E. On the other hand, mode has properties like k-step ahead forecasting distribution attains its maximum value at it. We consider the seasonal zero modified geometric integer valued autoregressive process of order 1 or ZMGINAR(1)s and study its coherent forecasting with some extensive simulation study.

Selection Criteria and Targeting the Poor for Poverty Reduction: The Case of Social Safety Nets in Sri Lanka

Format: CPS Abstract

Author: Dr Diana Dilshanie Deepawansa

Co-Authors:

  • Diana Dilshanie Deepawansa

Abstract Reducing poverty and improving the living standard of the poor and vulnerable populations in Sri Lanka have been among the vital critical agendas of governments. Hence the incumbent government has also designed and accelerated poverty-targeting programs to reduce poverty. The relevant government agencies significant play a major role in determining low-income families, supporting them in multiple ways and assisting them in achieving sustainable development by providing them with cash transfers, microfinance and various community-based and livelihood development activities. The primary safety net program currently targeting the poor in Sri Lanka is the “Samurdhi” programme. Although consecutive governments have spent vast amounts of money over several decades on social safety net programs, the impoverished people have been excluded considerably and continue to remain poor. The high leakage is present because of mistargeting, less transparency and accountability, political inspirations in the implementation of programs, and weakness of beneficiary selection methods. Hence, it is more important to redesign the selection criteria for social safety-net programs to effectively target the poor. This article explores measures to identify the target beneficiaries and potential beneficiaries assessing the deprivations at the household level in multidimensional aspects named as “Multidimensional Deprivation Score Test (MDST)”, which captures the experiences of the poor in several dimensions at the same time and computing weighted deprivation score by weighting each deprivation which was derived by a data-driven approach to capture the poorest and vulnerable people more accurately. The effectiveness of the criteria is ensured by empirical evidence through the Household Income and Expenditure Survey data in 2019 conducted by the Department of Census and Statistics, National Statistical Office in Sri Lanka. The output indicates that shifting to the MDS test to select beneficiaries could improve the targeting and significantly increase the impact of social protection programs on poverty. Keywords: Poverty, Social safety net, Selection Criteria

Self Service Reporting

Format: CPS Abstract

Author: Mrs Hamda Aldhaheri

Co-Authors:

  • Hamda Aldhaheri

To improve the timeliness of publications release, the idea of automating the reports has been suggested and implemented. The publication templates do not change, and the content is updated to match the selected report data. Statistics Center Abu Dhabi has developed a one-click report generation process with a user-friendly interface where the user can select the publication frequency and data to generate reports instantly.

Separable expansions for covariance estimation via the partial inner product

Format: CPS Abstract

Author: Mr Tomas Masak

Co-Authors:

  • Soham Sarkar
  • Victor Panaretos

The non-parametric estimation of covariance lies at the heart of functional data analysis, whether for curve or surface-valued data. The case of a two-dimensional domain poses both statistical and computational challenges, which are typically alleviated by assuming separability. However, separability is often questionable, sometimes even demonstrably inadequate. We propose a framework for the analysis of covariance operators of random surfaces that generalises separability, while retaining its major advantages. Our approach is based on the expansion of the covariance into a series of separable terms. The expansion is valid for any covariance over a two-dimensional domain. Leveraging the key notion of the partial inner product, we generalise the power iteration method to general Hilbert spaces and show how the aforementioned expansion can be efficiently constructed in practice at the level of the surface observations. Truncation of the expansion and retention of the leading terms automatically induces a non-parametric estimator of the covariance, whose parsimony is dictated by the truncation level. The resulting estimator can be calculated, stored and manipulated with little computational overhead relative to separability. Consistency and rates of convergence are derived under mild regularity assumptions, illustrating the trade-off between bias and variance regulated by the truncation level. The merits and practical performance of the proposed methodology are demonstrated in a comprehensive simulation study and classification of EEG signals.

Sexual abuse and unwanted pregnancies amongst women and girls in Malawi during the Covid19 pandemic

Format: CPS Abstract

Author: Dr Lana Chikhungu

The preponderance of violence against women and girls in humanitarian crises and displaced women is well established, but the Covid19 pandemic movement restrictions has shed more light on how vulnerable women and girls are to violence and sexual abuse within the confines of their own homes(WHO, 2021). From the feminist perspective, the inferior position occupied by women in society is a key contributor of violence rendered to them (Renzetti, Edleson, &amp; Bergen, 2001; Sharma, 1997). Increased levels of teenage pregnancies were reported during the first wave of the Covid19 pandemic (March to July) compared to the same period in 2019. However, the extent of the impact of the Covid19 pandemic on sexual abuse and unwanted pregnancies on women and girls in Malawi has not been investigated. This study uses data on reported cases of rape and defilement from the Malawi Police and the number of women accessing post abortion care services from DhIS2 to investigate the impact of the Covid19 pandemic on sexual violence and unwanted pregnancies for women and girls in Malawi. DhIS2 is an open-source project that is coordinated by the HISP Centre at the University of Oslo and is used by more than 73 countries worldwide to assess population health. This is the first study to use the Malawi DhIS2 data to analyse the trends in the post abortion care. The quantitative data is complemented by qualitative data obtained through key informant interviews with professionals from the Malawi Police and District hospitals to follow an explanatory mixed methods design (Creswell and Clarke, 2011). Descriptive statistical and graphical analysis is used for the quantitative data and the Braun and Clarke Thematic Analysis method is employed in the analysis of the data from key informant interviews. Findings reveal huge regional variations in the percentage change of the reported cases of rape and defilement and the number of women and girls accessing post abortion care services during the Covid19 pandemic compared to the period before. The percentage change in defilement cases ranges from -13% in the Eastern region to 17% in the Northern region and the percentage change in rape cases was -31% in the Southern region and 6% in the Northern region. The percentage of women accessing post abortion care services declined in most of the regions. The general perspective of key informants was that there was a rise in cases of rape and defilement and the number of women with unwanted pregnancies during the Covid19 pandemic period. The declines captured by the quantitative data were attributed to under reporting caused by school closures, movement restrictions, staffing challenges; reduced number of women terminating their pregnancies despite experiencing unwanted pregnancies and reduced number of women and girls accessing post abortion care services. Increases in reported cases were attributed to consistent awareness of and availability of services for reporting and treatment. This study confirms that sexual abuse towards women and girls increases during periods of restricted movements but levels are likely to be underreported.

Shrinkage Estimation of Spectral Matrix using various loss functions

Format: CPS Abstract

Author: Mr Prashant Dhamale

Co-Authors:

  • Akanksha Kashikar

Simple, flexible modeling for integer value responses with an application to alcohol consumption analysis

Format: CPS Abstract

Author: Mr Claude Nadeau

Various ways to model an integer valued response variable N exist (e.g., zero inflated Poisson, Negative Binomial, etc.). We tackle integer response modeling by simply viewing N as being the integer part of a positive continuous response T. So observing N=8 is akin to observing that continuous T is between 8 and 9 (interval censoring). We are then naturally drawn to survival analysis models. Though several models exists (e.g., Weibull, log-Normal, Cox proportional hazard), we chose to use piecewise constant hazard which is simple, flexible and able to fit adequately. An R package (ModIvIc) was created to assist in model specification and fitting. Using data from the Canadian Community Health Survey (CCHS), this approach was utilised to model weekly alcohol consumption as a function of covariates such as smoking and age.

Simultaneous Nonparametric Inference of M-Regression under Complex Temporal Dynamics

Format: CPS Abstract

Author: Miaoshiqi Liu

Co-Authors:

  • Zhou Zhou

The traditional linear regression model has received tremendous popularity due to its high interpretability and mathematical rationality. However, the validity is also limited because of the irrealistic assumptions. For example, the model assumes weak exogeneity and independence of the random errors, which are not necessarily satisfied in real life. Additionally, the coefficients are modelled as constant, which can easily be violated, especially when the data demonstrate a time structure. To incorporate the advantages yet circumvent the drawbacks, we modify the linear regression framework by using time-varying coefficients. Moreover, we allow the covariates and random errors to be nonstationary time series, where they are endowed with the freedom to be dependent. Under the expanded regression framework, the time-varying coefficients can tell how the covariates affect the response variable over time. As a result, our goal is to obtain reliable estimators of the coefficients and conduct hypothesis tests to verify people's beliefs. This session will first introduce obtaining the M-type estimators via local linear estimation under different loss functions, where a data-driven method is given to select the appropriate bandwidth. In this way, a general theory on M-estimators is established, which applies to the commonly-used least square estimator, quantile estimator and the Huber-loss estimator. Secondly, simplified with Bahadur representation, the limiting properties of the estimators can be retrieved via Gaussian Approximation techniques. Lastly, we propose a multiplier bootstrap to facilitate the process of obtaining critical values, which may be of separate interest.  As a highlight, we further consider the integrated process of the regression coefficients, which not only improves the convergence rate but also simplifies the hypothesis tests in some instances. With that said, we propose methodologies for three different types of hypothesis tests. The first type, Exact Function Test, refers to the cases where the null hypothesis only includes one specific function. This can be utilized for the classical variable selection procedure by testing whether the coefficients are exactly zero. The second type tackles cases where the null hypothesis specifies a parametric family while the alternative remains nonparametric. This type is often related to model diagnoses, such as checking for constancy or linearity, and is thus named as Lack-of-fit Test or Diagnostic Test. Finally, we introduce a unified framework to conduct a general class of qualitative hypothesis tests, where the null hypothesis also becomes nonparametric. Most shape constraints, including nonnegativity, monotonicity, and convexity, fall into the realm of such Qualitative Tests. As an application, our method is applied to studying the warming trend and time-varying structures of the ENSO effect using global climate data from 1882 to 2005.

Simultaneous Panel Data Regression Modelling of Economic Impact on Human Development: The Case of Sumatra Island Indonesia

Format: CPS Abstract

Author: Mr Muhammad Irsyad Ilham

Co-Authors:

  • 00 00

This research aimed to analyze the economic development including its macroeconomic variable in Sumatra during 2014-2020 by using simultaneous equation model with panel data. The results showed two way relationship between economic development and human development in Sumatra was exist. The government spending and net export had significant and positive impact on economic growth. Household consumption and mean years schooling had significant and positive impact on human development index in Sumatra. In brief, two determinants of economic growth of Sumatra were government spending and net export. Meanwhile, household consumption and mean years schooling had the indirect effect on economic growth.

Small Area Estimation of Poverty Using Remote Sensing Data (Case: Expenditure Per Capita Estimation of Very Poor Household in West Java, Indonesia)

Format: CPS Abstract

Author: Ms Novia Permatasari

Co-Authors:

  • Bagaskoro Cahyo Laksono
  • Azka Ubaidillah

This session will present the implementation of remote sensing data as auxiliary variables in a small area model to estimate poverty, especially expenditure per capita estimation of very poor household in West Java, Indonesia.

Small Area Estimation of teleworking indicators

Format: CPS Abstract

Author: Dr Mahamat Hamit-Haggar

Co-Authors:

  • Stanley Yu Su

In a designed survey, sample sizes are usually not sufficient to generate reliable direct estimates for small domain. The use of valid statistical models can provide small area estimates with greater precision, that is, the Small Area Estimation (SAE) is an appropriate statistical models that link survey data to auxiliary data available for the entire population to produce reliable indirect estimates. In other words, SAE borrows strength through auxiliary information. In this study, we propose to estimate small area population parameters by using (Prasad and Rao (1986, 1990) for the Fay-Herriot model, the technique produces estimates of parameters for finer geographic detail. We focus on teleworking or working remotely as particular parameter of interest. Model performance is evaluated, and small area estimates are generated for almost a full coverage of domain defined as Self-contained Labour Area, for good producing and services providing industries.

Small Area Population Estimate Model

Format: CPS Abstract

Author: Mr Yacoub Nuseibeh

Introduction: Abu Dhabi emirates is characterized by majority non- Emirati citizens population. Therefore, in and out migration plays a critical role in determining patterns captured in Population indicators. Abu Dhabi emirate is changing in rapid pace were many urbanized areas appear in short time which change the structure of population allocation in the emirate. This impose a challenge by relying on census data back in 10 years as a base line for current population estimations. Traditional method for population counting: The most precise method to obtain population size and characteristics is through the census as a baseline, due to this method being costly and time-consuming. The absence and irregularity of receiving of an important admin data -like Identity data- limit the ability of SCAD to use classical methods to do the population estimations. That lead to difficulty in estimating the population precisely and consistency. Population distribution was other challenge as well. Innovative methodology for population estimate: SCAD created a new estimation approach to provide policymakers and planners with up-to-date population indicators. This approach depends on modernizing statistical production model by utilizing alternative available admin data and unlock big data capabilities to support classical model used in NSO’s usually to estimate the total number of populations in Abu Dhabi in addition to allocate them properly on geographic level. The new approach is: “Small Area Estimation (SAE) model” which aims to calculate the total population estimates by district in Abu Dhabi emirate using utility consumption data. This has massive added value for SCAD and ADGEs as it greatly improves the population estimates on district levels and beyond. Using water consumption plus additional inputs from ADGEs, SCAD population, this will enable better decision making on demographics and the potential to build more accurate forecasting models. Modelling approach: In this method SCAD developed algorithm that combines Data Science and Population Statistics has been created. The models estimate population and their characteristics (nationality, gender) using the following steps: 1. Determine the amount of utilities over a month per household 2. Infer the average consumption by person using the conversion rate factors. 3. Determine correction factor (people by household) depending on its location to account for features such as; level of income, cadastral data, type of household, citizenship, if any available or applicable. 4. Correct number of people by household depending on the correction factors 5. Sum the population obtained by district 6. Validate with Population reference data by district 7. Apply correction factors for districts to adjust the total population

Smoothed functional principal/independent components: computational and theoretical considerations

Format: CPS Abstract

Author: Mr Marc Vidal

Co-Authors:

  • Ana M. Aguilera

We provide a brief overview on the smoothed functional principal/independent components analysis and focus on some critical aspects of their computation and theoretical underpinnings. Despite these reduction techniques are important tools of functional data analysis (FDA), we show that some questions regarding the computational estimation of the smoothed components and related factors remain to be addressed. Furthermore, we are concerned with functional observations defined on a domain that notably exceeds the capability of the sample covariance function and its eigenelements to be well defined. Although commonly found in neuroimaging studies, these “wide data” (small sample size and large dimension) are often neglected in the FDA setting. Here, apart from showing how one can enhance the computation of these reduction techniques, we discuss some strategies that might cover the analysis of wide functional data across all its domain avoiding numerical instabilities. In particular, a procedure inspired on Whelch’s method is introduced. We further investigate the performance of these methodologies in some neuroscientific applications.

Some Remarks on Characterization of Bivariate Poisson-Binomial Distribution Based on C.R. Rao’s Damage Model and Related Applications in Data Analys

Format: CPS Abstract

Author: Prof. Makarand Ratnaparkhi

The concept of damage model was introduced and investigated primarily by Rao(1962). In this model, the observation on discrete random variable, say X , is subjected to a damage process governed by certain probability law. The resulting variable referred to as the undamaged or the survival portion of X is denoted by Y. The study, related to the probabilistic interplay between the marginal and conditional distributions associated with X and Y is the subject of theoretical research in this area, undertaken by many researchers, during the past sixty years, who have published a number of papers. In particular, the celebrated characterization result for the Poisson distribution with binomial as the survival distribution {Rao and Rubin (1964)]is considered as the starting point for this research topic. In this presentation, the three new aspects of damage model are discussed briefly. First, the bivariate Poisson -Bivariate distribution is defined and some of its properties are studied including the problem of its characterization under what is known as the Rao-Rubin (or Rao’s) condition. Then, some of the conceptual problems that arise while using the damage model for the analysis of real data arising in microbiology experiment are discussed briefly. Also, the issue of limiting version of the damage model and related characterization of distributions as the sample size increases is touched upon briefly. Also, the other possible bivariate distributions generated as a result of an assumed damage model are introduced as a topic for future research.

Spatio Temporal Factor Model for Large Scale Data

Format: CPS Abstract

Author: Mr Tomoya Wakayama

Co-Authors:

  • Tomoya Wakayama
  • Shonosuke Sugasawa

With the proliferation of mobile devices, more and more population data is being taken. There is a growing demand for its use in real-world situations such as traffic planning and evacuation guidance during disasters. In this case, the computational aspect should be addressed since the multidimensional data is observed in space-time. Then we bring this problem to Functional Data Analysis (FDA). FDA is a methodology that treats and analyzes longitudinal data as curves, reducing the number of parameters and making it easier to handle high-dimensional data. Specifically, by assuming a Gaussian process, we avoid the huge covariance matrix parameters of the multivariate normal distribution. In addition, this data is time dependent and spatially dependent among districts. To capture these characteristics, a Bayesian factor model is introduced. This models the time series of a small number of common factors and expresses the spatial structure by factor loading matrices. Furthermore, the factor loading matrix is made identifiable and sparse to ensure the interpretability of the model. We also proposed a way to select factors. We study the accuracy and interpretability of the proposed method through numerical experiments and data analysis.

Spatio-temporal analysis of the influence of socio-economic factors on the labor market on municipal level

Format: CPS Abstract

Author: Mr Ilya Zalmanov

One of the most important elements of the sustainability of regional economy is a developed labor market, as well as a competitive level of wages. However, there is a significant differentiation in employment levels and incomes between territorial units even in the most developed economies. The importance of this issue is underlined by the UN in Sustainable Development Goals (goal No. 10 “Reduce inequality within and among countries”). The socio-economic growth of countries or particular regions directly depends on the development of their constituent parts - municipalities. The study focuses on the municipal level. Based on the use of SAR technologies, the relationships between various socio-economic parameters of municipalities are modeled, taking into account their ambiguous mutual influence distributed in space and time, as well as the influence of "centers of gravity" - large settlements (cities and urban agglomerations). The chosen approach made it possible to emphasize the features of the territories, to identify centers of attraction and points of growth, which as a result allows to develop the way how territories should be properly managed.

Spatiotemporal Evaluation of Socioeconomic Correlates of Overdose Mortality Before & During the COVID-19 Pandemic: Implications for U.S. Health Policy

Format: CPS Abstract

Author: Mr Jay Xu

Co-Authors:

  • Jay Xu
  • Sudipto Banerjee
  • Joseph Friedman

Drug overdose deaths in the U.S. began to sharply surge in 2013, driven by illicitly manufactured synthetic opioids such as fentanyl, in what is regarded to constitute the third wave of the U.S. overdose mortality epidemic. By 2016, fentanyl overtook heroin as the most commonly used drug in overdose deaths in the U.S. The rapid rise in fatal drug overdoses accelerated during the COVID-19 pandemic, with drug overdose deaths increasing by 30% from 2019 to 2020 and topping 100,000 calendar year deaths for the first time in 2021. Here, we perform a spatiotemporal analysis to investigate and characterize the socioeconomic correlates of drug overdose deaths in the U.S. prior to and during the COVID-19 pandemic, analyzing annual county-level drug overdose death counts during the five year period prior to the pandemic (2015-2019) and the first two years of the COVID-19 pandemic (2020 and 2021) from the lower 48 U.S. states and D.C. Using the Bayesian paradigm for statistical inference, we specify a flexible Poisson regression model with a non-parametric global time trend and separable spatiotemporal random effects, using the Integrated Nested Laplace Approximation (INLA) to perform model fitting. We obtain credible intervals for the quantities of interest, discovering that certain socioeconomic factors are more strongly associated with fatal drug overdoses during the pandemic compared to the pre-pandemic period. In public health terms, our findings indicate that the COVID-19 public health crisis steepened the socioeconomic gradient associated with drug overdose deaths, exacerbating the overdose crisis given the large macroeconomic shifts caused by the pandemic. From the vantage point of public policy, strategies to ameliorate the drug overdose crisis should address upstream social and economic factors that can lead to drug use, addiction, and their potentially fatal downstream effects, whose effects may be intensified during public health crises.

Spherical Random Projection

Format: CPS Abstract

Author: Mr Seungwoo Kang

Co-Authors:

  • Hee-Seok Oh

We propose a new method for dimension reduction of high-dimensional spherical data based on the nonlinear projection of sphere-valued data to a randomly-chosen subsphere. The proposed method, spherical random projection, leads to a probabilistic lower-dimensional mapping of spherical data into a subsphere of the original space and is analogous the well-known concept of random projection on Euclidean space. In this paper, we investigate some properties of spherical random projection, including expectation preservation and distance concentration, from which we derive an analog of the Johnson-Lindenstrauss Lemma for spherical random projection. Clustering model selection is discussed as a statistical application of spherical random projection, and numerical experiments are conducted using both real and simulated data. Promising results from these experiments provide evidence for the usefulness of spherical random projection as a data analysis tool.

Statistical Analysis on Breast Density: A factor for Breast Cancer.

Format: CPS Abstract

Author: Dr Iyabode Oyenuga

Co-Authors:

  • Nureni Olawale Adeboye

Breast cancer is the most common cancer in women worldwide. It has shown that women with mostly dense breasts have approximately four times the risk of breast cancer than women of same age and weight with mostly fatty breasts. The breast is made up of glandular and supportive tissue. Glandular tissue is the network that produces and transports milk to the nipple; the supportive tissue is largely fat but also contains fibrocollagenous tissue called glandular stroma. Glandular tissue and glandular stroma appears as a white area on a mammogram called mammographic density.

Statistical Literacy and Quality: two sides of the same coin?

Format: CPS Abstract

Author: Mr Jose Martins

Co-Authors:

  • Pedro Campos
  • Jose Martins

This paper aims to reflect on the importance of thinking about the promotion of statistical literacy also involving the aspects of quality.  To this end, it is suggested to compare the principles of quality with the various classifications of literacy to try to find the common aspects and thus make a correspondence between these two sides of the same coin.

Statistical approaches to model transfer

Format: CPS Abstract

Author: Prof. Kerrie Mengersen

Co-Authors:

  • Adam Bretherton
  • Brodie Lawson

An acknowledged challenge in statistics is to transfer a model or its outcomes developed in one (source) domain to another (target) domain. The transferability of models, and methods for undertaking such model transfers, have been widely discussed in applied settings such as ecology, genetics and transport, and are a popular topic in both the statistical and computer science literature. In this presentation, we consider this challenge from two statistical perspectives. The first emphasises a focus on model inference via a new Transfer Sequential Monte Carlo (TSMC) method. The second takes an information geometry approach and considers approximate-geodesic paths between two distributions. Both perspectives are illustrated through real-world examples. Work on both methods is joint with teams of co-authors, led by Adam Bretherton and Brodie Lawson, respectively.

Statistical methods of handling ordinal longitudinal responses with incomplete observations

Format: CPS Abstract

Author: Dr Omololu Stephen Aluko

Co-Authors:

  • Omololu Stephen Aluko

The rate of survival of human immunodeficiency virus (HIV) positive individuals resume to ameliorate with the consumption of highly antiretroviral therapy (HAART), but pulmonary disease prevalence has been growing unabated among them. The data was characterized by intermittent missing data due to the patient's failure to disclose vital health information and absence on visit days. Handling missing data was a difficult challenge in the dataset. We analyzed the data under the missing at-random missingness assumption. We compared the effects of marginal and conditional models in the study. Amongst the methods, the ordinal negative binomial model without any form of imputation performs greatly in simulation studies and real applications than multiple imputation-based generalized estimating equations (MI-GEE) and other models used.

Statistical quality of internal migration in Egypt: A comparative study between Census and Labour Force Survey 2017

Format: CPS Abstract

Author: Mrs Rawia Wagih Ragab

Co-Authors:

  • Ali Hepishy Kamel Abdelhamid

The quality in statistical product can be defined as the fitness for purpose of that product to the following five quality dimensions: relevance, accuracy and reliability, timeliness and punctuality, accessibility and clarity, and coherence and comparability. One of the most important quality standards in statistical data is Coherence and comparability. It is the degree to which data derived from different sources or methods is similar, and the degree to which data can be compared over time and domain). For example, geographic level, which is our focus. It also enables us to compare and link databases, and that achieve fourth Principle of the Code of Practice for Official Statistics, which highlights the need for “Sound methods and assured quality”, which includes a number of practices such as quality assurance, quality reporting and quality improvement as well as the use of common standards and concepts.Internal migration defined as the geographical moving of individuals within country’s borders. There are several types of internal migration, most notably from one governorate to another and from one region to another within country’s borders. Also includes rural migration, which means the movement of people from the rural areas to urban and vice versa. This subject has a reflection on many other issues such as the population density at source and destination, human resources, highly skilled persons and many others.This paper will present a comparative study of internal migration in Egypt 2017

Statistical review on the pandemic effect on water consumption

Format: CPS Abstract

Author: Mr Saeed Fayyaz

Co-Authors:

  • Fatemeh Gheitasi

This paper includes official statistics on water in the pandemic time. The paper includes an innovation of using different data sources to make an estimate and can be useful for decision makers in crisis management.

Statistics and GIS Journey in PCBS: From data collection to data dissemination

Format: CPS Abstract

Author: Mrs Raghad abulail

Co-Authors:

  • Raghad abulail

In this paper, a lengthy presentation will be made about the effectiveness of using GIS applications in serving statistical data in Palestinian Central Bureau of Statistics projects, and the outputs that were produced as data collection techniques, geo-databases, statistical atlases, interactive maps platforms, and open data sites to provide high-quality data, in an understandable manner to decision-makers.

Status of implementation of administrative population registers among selected African countries

Format: CPS Abstract

Author: Gloria Mathenge

Co-Authors:

  • Gloria Mathenge

The United Nations lists the existence of a central population register as one among eight other preconditions for conducting a register-based census. Over the last 5 years, an increasing number of countries  have established population registers for administrative and statistical purposes. In Africa, according to existing literature, no country is known to have achieved a fully functional administrative and/or statistical population register, including allowing for derivation of up-to-date information concerning the size and characteristics of the population. However, several countries such as South Africa , Rwanda , Kenya and Namibia , have progressed in the development of administrative data systems of this nature which are locally referred to as national population registers. While there is no evidence of any African country that has undertaken a register-based census some countries such as South Africa, have undertaken evaluations to establish the possible use of existing administrative registers for statistical purposes.

Steering the Future State of Philippine Tertiary Education in Statistics in Response to the Needs of the Data Science Industry

Format: CPS Abstract

Author: Prof. Nelia Ereno

Co-Authors:

  • John Titus Jungao

The explosion of data because of the fourth industrial revolution has opened new opportunities for businesses to improve profitability and efficiency while driving down costs. As with other countries, the data science and analytics (DSA) industry began to develop, and various initiatives started to bolster competitiveness - self-organized events for data practitioners, government-funded capability building, and the creation of data science curricula across several Philippine universities. This paper contributes to three different but connected components: (1) demand assessment - assessed the current demand for statistics skills in the data science industry in the Philippines, (2) supply evaluation - evaluated the readiness and alignment of current statistics course offerings in various undergraduate courses in the Philippines, and (3) course design - proposed an introductory course that covers topics to address the statistical skills gap between the academe and industry.

Strategies for the Sustainability of Stat Labs: A Case Study of Laboratory of Interdisciplinary Statistical Analysis, LCWU, Lahore.

Format: CPS Abstract

Author: Dr Asifa Kamal

Co-Authors:

  • Asma Zeb
  • Abeera Shakeel
  • Naila Amjad

How To Sustain And Strengthen The LISA-2020 Stat Labs In Future? - Insights From Experiments Conducted By Some Of The Members Dr Pooja Sengupta

Strengthening Modernization and Innovation – a holistic approach

Format: CPS Abstract

Author: Mr Jorge Magalhaes

Co-Authors:

  • Sofia Rodrigues
  • Francisco Lima
  • Almiro Moreira
  • Paulo Saraiva

To face the challenges of the new ecosystem in which national statistical offices are now operating, Statistics Portugal has been developing several initiatives to spur innovation and efficiency: The development of the National Data Infrastructure that strengthened the appropriation and use of administrative data and other sources and created a single point access to several types of data, and the StatsLab domain, are examples explored by the paper. Their enablers and approaches will also be explored.

Strengthening of the Residents' register and the Voters' register in Berlin

Format: CPS Abstract

Author: Prof. Ulrike Christa Dr. Rockmann

The use of registers for administrative purposes has a long tradition in Germany. Nevertheless, there is a great need for improvement with regard to quality management and the integrative and cross-sectional use of registers, in particular with regard to their use for official statistics. This is especially true against the background of e-government, the desire for fast and citizen-friendly administrative procedures, less cost-intensive surveys, less redundant and consistent data storage, an exclusively register-based census and more exclusively register-based official statistics. It goes without saying that population figures are of utmost importance for many purposes. Currently, there are two sources for population figures in Germany: the official population statistics in the responsibility of the federal NSI and regional NSIs, and the figures from the de-centralized population registers under the responsibility of the Länder and municipalities. The official statistics population figures are used in Germany for the redistribution of budgetary funds and value added tax between the federal states to ensure equal living conditions throughout the country, for the calculation of indicators such as GDP per person, the determination of constituency boundaries for German elections, etc. The residents' registers (RR) are used for regional administration and for updating the population figures of the official statistics between two censuses. The RR is the core data base for many individual-level administrative processes, such as claims review, needs assessment, school enrollment notification, creation of personalized offers, provision of information to specific target groups, such as Covid vaccination letters, notification of upcoming elections, or required driver's license exchanges. Furthermore, it is possible to identify family ties by linking the data sets accordingly. The 2011 census was the first census in Germany after a 30-year gap. It was the first census after the unification of the two German states and the first so-called register-based and sample-based census in which the residents' registers provided the central input. All in all, it was no surprise to official statisticians that the RRs overestimated the resident population in some large municipalities, especially those with high turnover and commuting. In particular, the so-called city states of Hamburg and Berlin showed significant differences: In Berlin, the residents' register contained about 100,000 more citizens than the final census result. This large difference posed problems for the Berlin city government. It also shook its credibility vis-à-vis the other federal states, especially with regard to state fiscal equalization. A lawsuit filed by the city-states of Berlin and Hamburg against the census method used in 2011 was lost in the Federal Constitutional Court in 2018. In December 2018, against the background of the decision of the Federal Constitutional Court, the project to further develop and improve the quality of the Berlin RR was launched. The presentation will discuss the findings and results.

Study of the association between level of caregivers’ stimulation and socio-emotional development of under-five children in Bangladesh

Format: CPS Abstract

Author: Shafayet Khan Shafee

Co-Authors:

  • Dr. Tamanna Howlader

Children are the cornerstone of a country’s long-term development and a crucial component of early childhood development is socio-emotional development. According to the Multiple Indicator Cluster Survey (MICS) 2019, about one-third of children in Bangladesh are not on track for achieving adequate socio-emotional development. The purpose of this study is to investigate whether the level of caregivers’ stimulation is associated with child socio-emotional development using data on 9445 children aged 3-4 years from the Bangladesh MICS 2019.

Study of the change in maximum temperature degrees in summer for selected areas in the Nile Delta during the period 2007-2020

Format: CPS Abstract

Author: Madam safia Mohamed

Climate change is one of the most important scientific concerns of the last few decades, and has attracted the interest of researchers, scientists, planners and politicians. Climate change has resulted in a continuous increase in global warming associated with the greenhouse effect. More attention is needed to understand the potential impacts of global warming that negatively affect ecosystems as well as human life. It can be said that the most affected climatic element on human activity and health is temperature. Every temperature rise of one Celsius ratio poses a risk to vulnerable ecosystems, and every rise of more than two degrees substantially multiplies the risk and can lead to the collapse of entire ecosystems, famine, shortages of freshwater resources, melting large chunks of ice, whichin turn leads to sea level rise, threatening coastal cities. (KATRINE BAUMERT,2006).

Study on the Dynamic Interdependent Structure and Risk Spillover Effect between Sino-US Stock Markets

Format: CPS Abstract

Author: PROF. DR. Menggen Chen

Co-Authors:

  • Menggen Chen
  • Yuanren Zhou

This paper intends to explore the dynamic interdependence structure and risk spillover effect between Chinese and the US stock markets, using the multivariate R-vine copula-complex network analysis and R-Vine copula-CoVaR model, with a sample of CSI 300, S&amp;P 500, and sub-sector indices from January 3, 2006 to July 3, 2019. The empirical results find that the Energy, Materials, and Financials sectors play leading roles in the interdependent structure of the Chinese and US stock markets, while the Utilities and Real Estate sectors are in the least important positions.The comprehensive influence of the Chinese stock market is close to that of the US stock market, but the differences in the influence of different sectors of the US stock market in the overall interdependent structure system are smaller. Over time, the interdependent structure of both stock markets changed, the sector status gradually became equal, the contribution of the same sector in different countries to the interdependent structure converged, and the degree of interaction between the two stock markets is positively correlated with the degree of market volatility. A further research shows that the lower tail interdependence coefficient between the Sino-US stock markets is larger than the upper tail interdependence coefficient and both display the volatility agglomeration effect. In contrast, the spillover risk of the US stock market to the Chinese stock market is higher than that of the Chinese stock market to the US stock market, and the US stock market play a more important role as the extreme risk sender in the interdependent structure.

Subjective well-being and poverty in the context of social inequalities in Morocco

Format: CPS Abstract

Author: Mr Khalid Soudi

Co-Authors:

  • Khalid Soudi
  • Youssef Benmimoun

The High Commission for Planning of Morocco has conducted several socio-economic surveys to understand the lived experiences of households based on their perceptions, their social representations and their concerns. In order to contextualize these declarations according to the social realities of households and their social environment, some studieshave been carried out to link the actual standard of living to the desired one and to measure two forms of subjective poverty, absolute and relative, with reference to appropriate thresholds.  Using Moroccan data, we’d like to show that absolute measures of welfare disparities are significantly correlated with perceptions of deprivation and subjective poverty. Our hypothesis is that the measurement of subjective poverty, in its absolute and relative forms, can be linked to appropriate thresholds. This form of poverty is strongly correlated with households' perceptions of their socioeconomic status.

Suicide in Germany

Format: CPS Abstract

Author: Dr Holger Leerhoff

Every year, around 10,000 persons in Germany commit suicide. Comprehensive and reliable statistical data on this phenomenon are an important building block for effective suicide prevention. Based on the official cause-of-death statistics, this article uses data from the relevant 273,000 death cases from the years 1992 to 2016 to investigate four topics of varying complexity.

Survival causal rule ensemble method considering the prognostic factors for estimating heterogeneous treatment effect

Format: CPS Abstract

Author: Mr Ke Wan

Co-Authors:

  • He Guanwenqing
  • Toshio Shimokawa

We proposed novel method based on Rulefit (Friedman and Popescu, 2008) and estimate the heterogeneous treatment effect (HTE) for survival outcome. In our proposed method, we define the HTE as the log hazard difference between treatment and control groups and constructs the model considering the prognostic effect using Cox-proportional hazard model framework.

Sustainable Development Goal 14 "Life Below Water": Scoping Report on the SDG Indicators

Format: CPS Abstract

Author: Miss Roxanne Elumbre

Co-Authors:

  • Sabrina Romasoc
  • Jazzen Asombrado
  • Shushimita Pelayo
  • Bea Gavira
  • Roxanne Elumbre

As nations around the world brace for the worsening impacts of climate change, protection of the environment and all its natural resources has never been more critical. This compelling need to preserve and protect the environment is laid out under the Sustainable Development Goals (SDGs). Established in 2015, the SDGs represent the global roadmap to achieve development for all, while protecting the planet and its resources. Focusing on the protection of marine ecosystems, Sustainable Development Goal 14 highlights the need to “Conserve and sustainably use the oceans, seas, and marine resources for sustainable development.” SDG 14 attempts to address relevant issues affecting the world’s oceans, such as the reduction of marine pollution, minimizing the impact of ocean acidification, and prevention of overfishing and other illegal fishing activities. Specifically, Goal 14 is composed of 10 development targets, with each target measured by a specific SDG indicator. To aid in the measurement of progress towards the achievement of Goal 14 and to increase the knowledge base for the SDGs in the Philippines, this study examined Tier II and III indicators under SDG 14. Employing an extensive review of literature and consultations with different stakeholders, this scoping study analyzed existing global and local methodologies for data collection, compilation, and computation of SDG 14 indicators. Such methodologies were further studied for possible localization within the Philippines. Additionally, data gaps were also identified in order to understand the possible challenges of localizing the indicators.

Sustainable Development Goal 4: Quality Education Scoping Report

Format: CPS Abstract

Author: Mr Jazzen Paul Asombrado

This paper focuses on gathering information on Tier II and Tier III indicators under SDG 4 to give a picture of the current progress in achieving this goal in the Philippines. This scoping study investigates the country's available data sources and methodologies and identifies possible data gaps through literature review and various consultations with the identified stakeholders concerning SDG 4. Lastly, this study endorses priority indicators that are feasible for more extensive methodological research.

Synthesis of Small Area Poverty Models: A MICE Approach

Format: CPS Abstract

Author: Dr Lara Paul Ebal

Co-Authors:

  • Zita VJ. Albacea

This paper applied a methodology to synthesize the regression coefficient estimates of the regional small area poverty models across time periods by using independently collected data sets. The synthesis was linked to missing data analysis and adopted the Multiple imputation by chained equations (MICE) approach to consider flexibility on the nature of data sets which involve categorical variables. The MICE approach was applied using the 2009 and 2012 poverty models of Region I developed by the Philippine Statistics Authority Small Area Estimation Team.

THE EFFECT OF ACTIVITIES OF ILLEGAL MINING ON COCOA PRODUCTION AND ITS IMPACT ON ECONOMIC DEVELOPMENT

Format: CPS Abstract

Author: Prof. Bashiru I.I. Saeed

Co-Authors:

  • Amidu Abdul Hamid
  • Ebenezer Tawiah Arhin
  • Caleb Nambyn

Although mining in tropical nations makes a considerable contribution to the world's mineral supply, uncontrolled mining operations in protected forests are linked to devastation, habitat loss, and biodiversity loss. In recent months, there have been several stories in both print and electronic media indicating that "galamsey," or illicit mining, has taken over arable ground intended for farming in the cocoa growing regions. The major goal of this study is to look into how illegal mining affects cocoa output, why cocoa farmers choose to give their land to illegal miners, what the economic repercussions are, and many other things, such as environmental concerns.

THE EMBARKED CODING SYSTEM USED IN THE MOROCCAN LABOUR FORCE SURVEY AND ITS ADVANTAGES

Format: CPS Abstract

Author: Mr Mahjoub Aaibid

This article describes an embedded coding system (ECS) used in the Labour Force Survey in Morocco (LFS) to codify, in the field, questions requiring the use of classifications, namely the respondent's profession , the branch of activity of the employing enterprise, the highest diploma obtained and the activities carried out by the respondent during the reference week. We will try in this article to answer the following questions: why such a system? How it works? How is it different from similar systems? And, how has it improved the quality of the data collected in this survey? Since using the CAPI system, and more particularly the SCE, the codification of these questions was done in the field during the interviews. This coding system is based on three main components, namely electronic classifications, search modes and validity and consistency checks. M. Aaibid1, A.Saoud 2 1. High Commission for Planning, Rabat, Morocco 2. High Commission for Planning, Rabat, Morocco m.aaibid@hcp.ma a.saoud@hcp.ma

THE ROLE OF PEOPLE WITH DISABILITIES IN THE LABOR MARKET

Format: CPS Abstract

Author: Mr ahmad risal

Co-Authors:

  • Ahmad Risal
  • Nurul Solikha Nofiani

Basically, People with Disabilities (PWDs) have equal rights with other people. Nevertheless, discrimination against them is still felt because they are considered unable to compete. Various kinds of stigmas about PWDs make it difficult for them to get jobs. As a result of these limitations, jobs for them are less available so people with disabilities prefer to open their own businesses or become entrepreneurs. This study aims to analyze the entrepreneurial conditions of PWDs and their role in the labor market in Central Celebes Province, Indonesia

THE SUSTAINABILITY TRANSITION DEBATE: IMPLICATIONS FOR DATA ON POVERTY, JOB CREATION AND INEQUALITY IN THE GLOBAL SOUTH

Format: CPS Abstract

Author: Dr Sipho Felix Mamba

The sustainability transition call has been received with mixed feelings in the Global South. Hence, the discourse has not been without a political connotation, sparking major debates between the North and South countries due to the different lenses through which this transition call is viewed. The notion of ‘environmental protection’ that characterizes the North does not resonate well with the South, which views this as an obstacle to economic progress and deliberate act of capitalism, questioning also the sustainability transition framework, which seems to have a Northern influence. The paper argues that although the sustainability transition (and the adopted transition framework) is global, the transition calls for national solutions and those different countries have different social, economic, political, and cultural realities, which need to be considered. It further argues that equal attention needs to be paid to the growing demand for data to measure, monitor and manage poverty, job creation (or the lack thereof) and inequality. The paper will further argue that in the main thrust of (midst (and probably and the post) COVID-19 pandemic era, credible, relevant and timely data will be required on key social and economic indicators for countries in the Global South who are engulfed in economic crises and are probably more concerned about economic recovery than the protection of the environment. This will assist in strengthening the sustainability transition debate wherein, the ‘just transition’ may be of less interest in the Global North, while growing in priority for the Global South. This paper therefore advocates for a need for a Global South perspective of the just transition, and the data needed emerging, which will consider the contextual realities and precariousness of the Global South economies, expressed through high poverty rates, high rates of unemployment and persistent inequalities. It will focus on the landlocked kingdom of eSwatini in southern Africa which has embedded the social issues such as poverty reduction, employment opportunity creation and inequality reduction in their just transition commitments, specifically in the country’s recent National Development Plan and the implication of this for the country’s economic progress.

Taking the Pulse of the Consumer Credit Market: Short-term Indicators

Format: CPS Abstract

Author: Ms Saloni Salaria

Co-Authors:

  • Dr. John Dunne

Access and availability to new data sources creates significant statistical opportunities in the financial market. This paper presents an approach under development at Central Statistics Office Ireland to produce short-term indicators that will explore the dynamics within the consumer credit market. The short-term indicators are primarily based on quantifying active contracts, customers, and borrowers in the consumer credit market, and the approach monitors changes in these indicators with respect to different population cohorts and type of credit. Cohorts can be defined by age, gender, employment, location of residence, household structure, income, and any other available attributes. We discuss how these indicators respond to real-world events and the potential values they hold for policy makers. We conclude by presenting selected results from pseudonymized data from Jan 2021 to illustrate the practical use of the indicators.

Targeted timing of mail invitation: impact on web surveys response rate and response speed

Format: CPS Abstract

Author: PROF. DR. Annamaria Bianchi

Co-Authors:

  • Peter Lynn
  • Alessandra Gaia

Survey researchers using online data collection methods continue to invest in efforts to identify ways of improving response rates. Meanwhile, the tools used to improve response rates and response speed have become more sophisticated, particularly various types of adaptive designs. In this context, panel surveys provide a particularly rich environment for the application of targeted designs as the wealth of prior information available can be used to identify subgroups (likely) variation in the outcomes of interest and to inform the choice of design features that might provide improved outcomes.

Teaching Statistics for Social Good: conceptions, resources and assessment

Format: CPS Abstract

Author: Prof. Jim Ridgway

The paper presents a conceptual framework to describe the knowledge needed to comprehend and critically evaluate societal issues. It offers an overview of extensive teaching resources based on dynamic visualisation tools and open-source software packages, and offers tools for exploring the alignment of curriculum intentions, classroom teaching practices, and assessment methods. See Ridgway, J.(ed.). (2023). Statistics for Empowerment and Social Engagement: teaching civic statistics to develop informed citizens. Springer.

Tell me who your friends are, and I will tell you who you are: The role of the second most frequent group in cluster labeling

Format: CPS Abstract

Author: Dr Anna Khalemsky

Co-Authors:

  • Yelena Stukalin

The main goal of the present study is to highlight the group of observations with the second most frequent value for a categorical variable in the same cluster in order to learn about the similarities and dissimilarities of different population groups. We aim to identify situations where focusing on the second most frequent group in the cluster improves the interpretation, labeling, and, finally, the classification results.

Testing of Hypotheses in Triple-Negative Breast Cancer: Rank order Approach with Minimum Assumptions

Format: CPS Abstract

Author: PROF. DR. Sunil Mathur

Co-Authors:

  • Kai Sun
  • Ethan Burn
  • Ravi Pingali
  • Eric Bernicker
  • Siddarth Ganguly
  • Shreya Mathur
  • Jenny Chang

In breast cancer research, pathologic complete response (pCR) following neoadjuvant therapy has shown improved event-free survival (EFS) and overall survival (OS). The association of pCR is generally tied with the type of breast cancer subtype, however, that association has not yet been explored much. Our group studied the association of the association between pCR and survival outcomes in triple-negative breast cancer (TNBC). We proposed a new approach based on ranks that have the capabilities to test the difference between median survival time between the control and treatment groups. Under the null hypothesis and equal sample sizes, the proposed test statistic is distributed as a linear combination of independent chi-square random variables. Using the Monte Carlo method we computed empirical power which shows that our test performs better than its competitors under heavy-tailed, light-tailed, and even elliptically asymmetric population distribution. Overall, our findings suggest that adjuvant therapy is associated with improved EFS/OS in patients with TNBC who received neoadjuvant therapy, regardless of pCR status.

The Change in the Patterns of the Egyptian Family Expenditure in Clothing and textiles (2005-2013)

Format: CPS Abstract

Author: Mrs SHAIMAA FAROUK HASSAN HANAFY

Co-Authors:

  • MOSTAFA MAHMOUD ABDELNABY ESMAIL

The study of income, expenditure and consumption is considered one of the most important household studies which conducted by statistical agencies in different countries of the world providing a large amount of data upon which we can measure the standard of living for families and individuals as well as establish the rules and information to measure poverty and the tools for weights required for the compilation of Consumer Price Index which is an important indicator to measure inflation. This paper is concerned with the study of behavior for Clothing and textiles in Egypt in rural, urban and total through family expenditure research income, expenditure and consumption survey in (2004 / 2005), (2008 / 2009), (2010 / 2011) and (2012 / 2013). Pearson curves were fitted to compare their patterns.

The Effect of Covid 19 to Import Contents of Indonesian Export

Format: CPS Abstract

Author: Mrs Purwaningsih Purwaningsih

Co-Authors:

  • Realita Eschachasthi
  • Titi Kanti Lestari

Impact of Covid 19 to international trade

The Effect of Institutional Changes on Statisticians’ Excellence at a Government Organization in Abu Dhabi

Format: CPS Abstract

Author: Ms Wadeema Mohamed Alkhoori

Co-Authors:

  • Saif Al Ketbi
  • Maha Almubarak
  • Maryam Al Jneibi

Statistics Center Abu Dhabi (SCAD) has undergone structural changes leading to the empowerment of statisticians and increased efficiency throughout the organisation. Prior to this change, statisticians were responsible for all stages of the GSBPM process. Since introducing these changes, the statistician can focus more on processing and analysis. We conducted a survey on statisticians and looked at innovation, teamwork and collaboration, communication, specialisation and adaptability to change as a measure of excellence as a result of these changes