Definition of subsets of clinical concepts with the SNOMED CT Expression Constraint Language

The breadth of SNOMED CT, with more than 360,000 clinical concepts, requires the Definition of subsets of specific concepts to ensure their use in concrete scenarios, such as data entry and validation (e.g., to ensure that recorded diagnoses pertain to a specific type of disease, such as liver diseases), clinical protocols, data analysis and mining (e.g., to filter cases where the patient has undergone an organ transplant), development of decision-support systems (e.g., identification of infectious diseases), or clinical research. This helps different systems and organizations interpret and share health data consistently and unambiguously, that is, with high levels of semantic interoperability.

These subsets can be specified in two ways:

  • Extensionally: that is, by providing a list of concepts. For example, the subset of food allergies in SNOMED CT can be specified by listing one by one the concepts that represent allergies to a type of food (in SNOMED CT, there are over 250, of which 170 belong to the National Health System extension):
    • 23291000122108 | garlic allergy |
    • 23391000122104 | pepper allergy |
    • 24231000122107 | dill allergy |
    • 23251000122104 | pear allergy |, etc.

It is worth noting the difficulty of managing potentially large subsets, consisting of thousands of concepts. Additionally, the fact that SNOMED CT releases a new version every month requires updating the subsets to include new concepts or concepts that have been deactivated if we want to keep our information model aligned with the most recent version of SNOMED CT.

  • Intensionally Intensionally: that is, by defining a set of conditions or criteria that a concept must meet to belong to the subset. For example, the subset of food allergies in SNOMED CT can be specified by the condition that the concept must represent an allergic reaction to a type of food. This approach allows the subset to dynamically include any new concepts that meet the defined criteria without having to manually list each one.

To define expressions that represent subsets of clinical concepts intensively and formally, SNOMED CT provides a language called Expression Constraint Language (ECL)., abbreviated as ECL [1]. This language provides a set of operators that are aligned with the logical model of the terminology. It is important to remember that in SNOMED CT, there are two types of relationships: the relationships is a, which allow specifying hierarchies of concepts (for example, 1131000122107 | peach allergy | is a subtype of 91932007 | fruit allergy |, which in turn is a subtype of 414285001 | food allergy |, and so on until reaching the root concept of SNOMED CT); and the relationships of attribute, which allow linking concepts across different hierarchies (for example, 1131000122107 | peach allergy | has an attribute relationship of type 246075003 | causal agent | with the concept 735049002 | peach |). As additional information, SNOMED CT currently includes 125 types of attribute relationships, among which are 246075003 | causal agent |, 42752001 | due to |, 363698007 | finding site |, 127489000 | has active ingredient | or 410675002 | route of administration |, among others.

The simplest ECL expressions are those that select a hierarchy. For example, the expression:

< 609328004 | allergy |

defines the subset formed by all the concepts in SNOMED CT that are descendants (subtypes) of the concept 609328004 | allergy | (note the use of the operator `<`, which indicates "descendants of"). “<”, which means descendants or subtypes). To refine this hierarchy and select, for example, only those concepts representing allergies caused by a biological or pharmaceutical product, the attribute relationship 246075003 | causal agent | and the hierarchy of biological/pharmaceutical products would come into play, like this:

< 609328004 | allergy |: 246075003 | causal agent | = < 373873005 | biological/pharmaceutical product |

(note the use of the operator “:” to apply the refinement formed by the attribute 246075003 | causal agent | and the hierarchy < 373873005 | biological/pharmaceutical product |). Some concepts that would be obtained by executing this expression would include 23611000122102 | amylase allergy |, 22181000122105 | allergy to hepatitis A and B vaccine |, or 22101000122103 | allergy to measles vaccine |.

Similarly, but replacing the operator “=” (equal) with “!=” (different), we can select allergies caused by concepts other than biological/pharmaceutical products:

< 609328004 | allergy |: 246075003 | causal agent | != < 373873005 | biological/pharmaceutical product |

such as 5611000122107 | casein allergy |, 1269425007 | gluten allergy |, or 1003755004 | latex allergy |, among others.

The ECL language also allows the use of logical operators (“AND”, “OR” and “MINUS”) to combine expressions. For example, we can select allergies to vegetables, allergies to meats, and allergies to shellfish using the logical operator “OR” (union of sets):

< 16067251000119104 | allergy to vegetables | OR < 703931001 | allergy to meat | OR < 712842007 | allergy to shellfish |

But ECL not only allows specifying simple subsets, like the ones we presented as an introduction to the language. Its richness in operators and the added filters in its more recent versions make it such a powerful language that it allows us to specify subsets as complex as our imagination or needs allow. ECL, for example, enables specifying hierarchies from descendants ("<") including the root concept of the hierarchy ("<") including the root concept of the hierarchy (">>"). It is also possible to select children of a concept ("<!") including the concept itself ("<!") or members of a refset ("^"). ECL allows traversing the expression in reverse order ("R") and setting the number of specific attribute relationships that concepts must have to satisfy the expression using cardinality ("[x..y]"). Furthermore, ECL allows adding filters that the concepts or their descriptions must meet to be included in the subset ("{{ }}"). For concepts, these filters can be applied to their definition status (primitive/defined), their module, their creation date, or their status (active/inactive). For descriptions, filters can be applied to their language, type (full description/synonym), and acceptability (preferred/acceptable).

For example, the following expression defines the subset of viruses causing diseases that contain the words "influenza" or "human" in any of their descriptions and that are located in at least two body structures, one of which is neither the respiratory system nor the digestive system:

< 49872002 |virus|: R 246075003 |causal agent| = (< 64572001 |disease| {{ term = (wild:»*influenza*» wild:»*human*»)}} : [2..*] 363698007 |finding site| != (< 20139000 |structure of the respiratory system| OR < 86762007 |structure of the digestive system|))

This subset contains viruses such as 725894000 | Influenza virus |, 9482002 | human papillomavirus |, or 19965007 | human herpes simplex virus |, among others.

Since the inception of the ECL language nearly 10 years ago, Veratech has closely followed its evolution and utility. This has led us to develop our own execution engine [2] with advanced features such as expression simplification. This engine has been included in the SNOMED CT viewer of the Ministry of Health [3], strengthening our ability to offer innovative and specialized solutions in the area of semantic interoperability of health information in general and SNOMED CT in particular.

References

[1] https://confluence.ihtsdotools.org/display/DOCECL/

[2] Giménez-Solano, Vicente Miguel; Maldonado Segura, José Alberto; Boscá, Diego; Salas-García, Santiago; Robles Viejo, Montserrat (2021). Definition and validation of SNOMED CT subsets using the expression constraint language. Journal of Biomedical Informatics, 2021. https://doi.org/10.1016/j.jbi.2021.103747

[3] https://snomedsns.es/ecl