Post

Dataset Specializations vs. Biomedical Concept Variants

Reply to Kerstin's LinkedIn Post on BC Variants.

Dataset Specializations vs. Biomedical Concept Variants

Dataset Specializations vs. Biomedical Concept Variants

Overview

As Biomedical Concepts (BCs) gain adoption across the clinical research standards ecosystem, two related terms, CDISC Dataset Specializations (DSSs) and Biomedical Concept Variants, risk being used interchangeably. While they share surface-level similarities as mechanisms for adapting a core concept to a specific context, they reflect fundamentally different design philosophies, governance models, and use cases. Conflating them risks obscuring important architectural decisions that affect how organizations implement and manage standards-based metadata.

This blog post defines each approach, articulates the key differences between them, and explains why the distinction matters for standards development and implementation. For those new to BCs, check out the Additional Reading section at the end of this blog post for links to background material.

CDISC Dataset Specializations

CDISC’s approach to Biomedical Concepts (BC) is built on a two-layered architecture that deliberately separates the meaning of a concept from its implementation in a particular standard.

The first layer is the Conceptual Layer, which defines standards-agnostic BCs aligned with NCI terminology. These BCs provide the semantic core. They describe what is being measured or collected (e.g., Systolic Blood Pressure) independent of any particular data model. BCs are pre-coordinated, meaning their full meaning is encoded in the concept itself, and they are designed to be immutable. Each BC includes a set of elements that further define the meaning and intention of the concept. A given BC should carry the same semantic meaning regardless of where or how it is ultimately represented in a dataset.

The second layer is the Implementation Layer, where Dataset Specializations (DSSs) reside. A DSS is a pre-configured, opinionated implementation of a BC within a specific standard. For example, a BC could have an SDTM Dataset Specialization and a CDASH Dataset Specialization. A single BC may have zero to many DSSs, each tailored to the structural and operational requirements of its target standard. DSSs include explicit relationships to the NCI Thesaurus (NCIt) terminology, such as codelists, value lists, and terms. They provide focused Value Level Metadata (VLM) that serves as a building block for Define-XML generation. The CDISC 360i program uses BCs and their related DSSs to help automate the generation of define.xml files and eCRFs.

The design principles governing DSSs emphasize several important characteristics. DSSs are curated and pre-configured by CDISC, meaning they represent the organization’s authoritative interpretation of how a BC should be expressed in a given standard. They can be applied directly for straightforward implementation or further refined to meet the needs of a specific therapeutic area or individual study. When a therapeutic area requires modifications, this may result in a new DSS for that therapeutic area, which can then be further tailored at the study level. DSSs are defined at a level of granularity that makes sense to bundle together when you submit data for review.

Critically, CDISC content is curated, not duplicated. There is one authoritative concept, and DSSs are implementations of that concept, not alternative representations of it. This distinction is central to understanding how DSSs differ from Variants.

Biomedical Concept Variants

The BC Variant model, as articulated by certain sponsor organizations, takes a different approach rooted in metadata registry management. Variants are designed to accommodate the reality that organizations, therapeutic teams, and external standards bodies may need to maintain alternative or evolving representations of the same underlying concept.

In one sponsor framework, a Variant is created when a therapeutic team or sub-team produces a tailored copy of a global concept with one or more attribute values that differ from the global definition. Concepts have attributes with varying levels of mutability: some cannot be changed, some require an exemption, and some can be freely modified. Any modification that triggers a change to a controlled attribute results in the creation of a new Variant.

In this framework, study specifications resolve Variants through a hierarchical lookup. A study assigned to a sub-indication first looks for a matching Variant at that level, then moves upward through indication, therapeutic area, and finally the global level. This creates a structured inheritance model in which more specific contexts can override more general ones.

In another sponsor framework, the Variant model extends the ISO 11179 metadata registry standard. In this framework, Variants are grouped under a common facade, a shared semantic meaning, and are differentiated into two kinds based on their relationship to the owning Registration Authority. A Version represents an updated iteration of a concept owned by the same Registration Authority, while an Alternative represents a concept with the same meaning but owned by a different Registration Authority. For example, if CDISC created a new representation of a concept originally owned by another standards body, it would be classified as an Alternative rather than a Version.

In this framework, configurations combine different sets of Variants by pointing to selected administered items, allowing organizations to compose tailored metadata sets from the available Variant pool.

Core Distinction

CDISC’s Dataset Specialization model is designed for a curated ecosystem where there is one authoritative concept, and each implementation is a deliberate expression of that concept in a specific standard. DSSs are curated, pre-configured implementations of a single authoritative BC within a specific CDISC standard (e.g., SDTM, CDASH). CDISC maintains one concept and creates standard-specific implementations of it. Because DSSs are not duplicates of the BC but rather structured implementations of it, the semantic integrity of the underlying concept is preserved even as it is expressed across multiple standards. Each DSS is a distinct expression of the same BC, governed centrally by CDISC.

The Variant model, by contrast, is designed for registry environments where multiple organizations or teams may legitimately need their own representation of the same concept. This is particularly valuable in large, federated enterprises where therapeutic areas evolve their terminology over time, or where concepts must bridge multiple external standards bodies with different naming conventions. The trade-off is increased complexity in governance and version management. Variants are managed copies of a concept that accommodate organizational, temporal, or naming differences. Variants may differ by version (updated by the same owner) or by alternative (created by a different owner).

Key Differences

  • Purpose: DSSs implement a concept in a data standard. Variants maintain alternative representations across organizational contexts.
  • Relationship to Core Concept: A DSS has a one-to-many relationship to a BC. A single BC has zero to many DSSs. For example, each standard may have a different DSS (SDTM, CDASH, etc.). A variant has a one-to-many relationship with a BC. A single semantic meaning may have multiple Variants differing by owner, version, or context
  • Governance: BCs and DSSs are centrally curated by CDISC with no duplication. Variants are distributed across therapeutic teams or Registration Authorities and involve controlled duplication by design.
  • Axis of Variation: DSSs vary by target standard, therapeutic area, or more generally by implementation context. Variants vary by ownership, version history, or organizational scope.
  • Resolution Model: DSSs are selected directly by matching the target standard or implementation context. Variants are resolved through hierarchical inheritance or configuration-based composition.
  • Duplication: Each DSS is a distinct implementation of a single concept, with no duplication. Variants have controlled duplication by design. Variants are copies with attribute-level differences.
  • Use Case: DSSs support consistent and automated standards implementations. Variants support enterprise metadata registry management across diverse teams and evolving organizational metadata needs.
  • Technical: DSSs behave like concrete subclasses of an abstract BC, inheriting semantic identity and providing implementation-specific structure. Variants behave like managed clones or configured instances of the same concept, differing in attributes rather than in type. DSSs and BC Variants operate at different levels of the type-versus-instance distinction.

Why It Matters

DSSs and BC Variants address related but distinct problems. DSSs are curated implementations of a single authoritative concept across different standards, governed centrally by CDISC. Variants are managed representations of concepts across organizational boundaries and over time, governed by the Registration Authorities or teams that own them.

When these concepts are conflated, organizations risk adopting a Variant-oriented mindset within what should be a curated workflow, leading to unnecessary proliferation of concept representations. Or conversely, applying a centrally curated model where the flexibility of Variants is genuinely needed. Maintaining clarity on which model is in play ensures more intentional and effective metadata architecture decisions across both standards development and enterprise implementation.

Acknowledgments

Thanks to Linda Lander, Lex Jansen, Anthony Chow, Bess LeRoy, and Peter Van Reusel for reviewing this post and providing useful feedback.

Additional Reading

Understanding BCs and how to use them can be challenging. For those interested in additional background information, check out our BC Starter Package in GitHub. There’s a BC Overview Training slide deck that might be of particular interest, as it explains BCs and includes examples.

This post is licensed under CC BY 4.0 by the author.