64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Flexible infrastructure for next generation of statisticians - Eurostat presentation

Author

MM
Mr Mátyás Mészáros

Co-author

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: IPS Abstract

Abstract

Official Statistics needs to innovate continuously in order to stay relevant and gain agility. New technologies are key enablers for this. New architectures allow rapid changes and continuous deployment at scale and using open source solutions allow maximising its reuse. In addition to the continuously changing technology landscape, there is a continuous need all over the world for a flexible environment where statisticians and data scientist can use the latest technologies to create new innovative products. To fulfil these requests several initiatives were launched around the globe. Among these new developments, one of them is the cloud agnostic data lab project of Eurostat.

The presentation describes the background behind the project. This includes the Interoperable Europe initiative of the European Commission to improve the public sector interoperability by using open source solutions. At the beginning of the project, already existing solutions were reviewed like BinderHub, Renku and Onyxia. Then several workshops were held to identify the needs of data scientists and the missing components between the already existing solutions. Based on this gap analysis, it was decided that the proof of concept will not be built from scratch but it would be based on Onyxia developed by the French National Statistical Institute. Onyxia was extended with Keycloak to be used with EU Login, the central identity provider for the services provided by EU institutions. To the original services provided by Onyxia, three new additional was added based on the user needs: Apache Superset for data visualization, GitLab for internal code storage and versioning and CKAN as a data catalogue.

This development showed the benefit of using open source, as the project could build on already existing solutions and the cooperation with the developers of different components showed the importance of interoperability. Finally, the project results is used in the new cloud agnostic service offering of the EC Data Platform that is used for the 2023 EU Big Data hackathon as a flexible cloud infrastructure.