INFOBANCO: a healthcare research platform based on openEHR

Veratech wishes to share the results of the INFOBANCO project, which has been developed between April 2022 and June 2023 for the Madrid Health Service. Veratech for Health has had the privilege of participating in this project from its initial conception in the pre-market consultation, to its final implementation, promoting the adoption of openEHR and archetype modeling methodology.

INFOBANCO is the result of a Public Procurement of Innovation project with the aim of constructing a regional data platform for healthcare research. This platform seeks to offer information exploitation services to clinicians, managers, and researchers, enabling the combination of data from multiple sources. It is equipped with governance, collection, transformation, querying, visualization, and data analysis tools to gain knowledge and support decision-making.

The architecture of the INFOBANCO platform can be seen in the following figure:

The innovative idea behind this architecture is to place a clinical data repository (CDR) according to the openEHR standard as the core of a research platform and to use it as a source for data transformations (ETL processes) to other standards commonly used in the clinical research field (OMOP CDM, HL7 FHIR, CDISC ODM, i2b2). The hypothesis of this work was that the reference model and archetypes of openEHR provide the most comprehensive set of information (both healthcare and contextual information) to feed any other information model used by other standards.

Components of the INFOBANCO platform:

  • Inputs. Two different information systems have been integrated, including the Electronic Health Record (EHR) information from the Hospital 12 de Octubre and the EHR information from the Primary Care Area.
  • Data Lake. A primary repository for integrating raw data, providing a single entry point for processing them. This data lake offers data in multiple layers: raw (data as they are at their source), clean (basic normalization, such as date or number formats), and consumption (classification/organization of data according to their domain).
  • openEHR CDR. The data from the data lake has been normalized following openEHR archetypes and templates. Initially, only the data covered by existing archetypes and required by output formats have been included in the openEHR CDR. This CDR is built using the Better platform.
  • Standard Outputs. ETL processes have been implemented to convert openEHR data to other standard formats. The procedure followed has been the selection of relevant data for each output using AQL initially, and subsequently the implementation of data transformations using the most suitable technology in each case: Python, Java, Pentaho.
  • Non-standard Outputs. Some use cases required information that has not yet been included in the openEHR CDR (mainly data not covered by existing archetypes or internal management data from input systems). In those cases, for example, to build a BI dashboard, data can still be accessed directly from the raw data lake.

he project tasks did not include any specific archetype modeling activity. Only templates were created using existing archetypes. In the project, over 35 existing archetypes from the CKM were used to build 21 templates representing Demographic data, Encounters, Health Problems, Medication Administration, Vaccinations, Alerts, Phenotype Report, Genomic Report, Family History, etc.

The development of the platform was completed in July 2023. A first set of 100,000 patients has been loaded into the platform, with the intention of loading the 450,000 patients from the Hospital 12 de Octubre in the coming months, and the goal is to load the 6.5 million patients from the Community of Madrid in the future.

This project has been made possible through the collaboration of the following organizations:

  • Hospital 12 de octubre, Madrid
  • Primary Care Area, Madrid
  • Veratech for health
  • NTT Data España
  • RHEA Group
  • Better

Funding and Management:

  • European Union, European Regional Development Fund (ERDF)
  • Ministry of Health of Spain
  • Ministry of Health of the Community of Madrid

For more information:

https://cpisanidadcm.org/infobanco/