Skip to content

🧬 CNAG Biomedical Informatics

Building trustworthy computational infrastructure for biomedical discovery.

Open-source biomedical informatics infrastructure for interoperable genomic, phenotypic, clinical, and AI-enabled research.

We are a biomedical informatics research and software engineering initiative affiliated with the Biomedical Genomics Group at CNAG.

We develop open-source computational infrastructure that enables researchers to work with complex genomic, phenotypic, and clinical data. Our work spans biomedical interoperability, reproducible analysis, semantic technologies, and trustworthy AI, with the goal of making biomedical discovery more accessible, reproducible, and scalable.

Rather than building isolated applications, we develop reusable infrastructure that connects data, standards, analytical workflows, and AI into a coherent research ecosystem.


🎯 Mission

Our mission is simple:

Build computational infrastructure that enables biomedical discoveries.

We focus on:

  • 🔄 Biomedical data interoperability
  • 🧬 Translational informatics
  • 🤖 Trustworthy AI for biomedical research
  • 🏗️ Sustainable open infrastructure
  • 📊 Reproducible computational workflows

We build practical software that helps researchers and clinicians spend less time integrating data and more time answering biological and clinical questions.


🧭 Philosophy

Our work is shaped by more than two decades of experience in computational biology and bioinformatics across evolving sequencing technologies, biomedical standards, and computational systems.

We prioritize:

  • 🔄 Interoperability over isolated silos
  • 🧪 Reproducibility through reusable workflows
  • 🏗️ Sustainable infrastructure instead of short-lived prototypes
  • 📚 Open science and accessible software
  • 🤖 Pragmatic, trustworthy AI grounded in structured biomedical data

We believe the future of biomedical AI depends on reliable infrastructure, interoperable standards, transparent data models, and sustainable software ecosystems.


🌐 Research Platform

Our software ecosystem is modular by design. Each project solves a specific problem while contributing to a unified biomedical research platform.

🔄 Phenotypic & Clinical Interoperability

Convert-Pheno

Interconversion between biomedical and phenotypic data standards.

  • Phenopackets interoperability
  • REDCap, OMOP-CDM and CDISC-ODM support
  • CLI, API and Web UI
  • Dockerized deployment

Resources

beacon2-cbi-tools

GA4GH Beacon v2 interoperability, validation and ingestion tooling.

Resources

OMOP CSV Validator

Validation workflows for OMOP-CDM CSV datasets.

Resources


🧬 Phenotype Analysis

Pheno-Ranker

Semantic comparison and ranking of interoperable phenotypic data.

  • Semantic similarity workflows
  • Cross-format interoperability
  • Interactive Web UI

Resources


🧾 Metadata & Identifier Systems

ClarID-Tools

Schema-driven biomedical identifier generation and validation.

Resources


⚙️ Genomic Processing

CBIcall

Configuration-driven genomic variant-calling workflows.

Resources


🚀 AI Research Platform

Biomedical Research Navigator (under active development)

A secure AI-enabled research platform integrating multimodal biomedical data, semantic interoperability, reusable analytical workflows, local and cloud LLMs, and the broader CNAG Biomedical Informatics ecosystem.

The Navigator is designed to become the integration layer for our software ecosystem rather than another standalone application.


🧩 Standards & Technologies

Biomedical Standards

  • GA4GH Beacon v2
  • GA4GH Phenopackets
  • OMOP-CDM
  • REDCap
  • CDISC-ODM
  • openEHR
  • JSON Schema

Technologies

  • Python
  • Perl
  • JavaScript / React
  • R
  • Docker
  • MongoDB
  • Snakemake
  • REST APIs
  • Local LLMs
  • MCP-compatible workflows

🌍 European Research Projects

We actively collaborate within the ELIXIR community and participate in European initiatives focused on interoperable biomedical data, federated analytics, translational informatics, and precision medicine.

3TR

https://3tr-imi.eu/

HEREDITARY

https://hereditary-project.eu/

PRECISESADS

Foundation cohort for our long-term work on AI-assisted biomedical research infrastructure for immune-mediated diseases.


📚 Selected Publications

ClarID

https://doi.org/10.1186/s13326-026-00349-6

Beacon v2 Reference Implementation

https://doi.org/10.1093/bioinformatics/btac568

OMOP CDM to Beacon v2 Interoperability

https://doi.org/10.1101/2024.12.25.24319606

Convert-Pheno

https://doi.org/10.1016/j.jbi.2023.104558

Pheno-Ranker

https://doi.org/10.1186/s12859-024-05993-2


🔭 Vision

We are building more than individual software tools.

Our long-term goal is to create an open, modular research platform where interoperable biomedical data, computational methods, and trustworthy AI work together to accelerate biomedical discovery.

If our software enables researchers to answer questions they could not answer before, then we have achieved our goal.


🤝 Collaboration & Open Science

We welcome:

  • Scientific collaborations
  • Interoperability initiatives
  • Standards-related projects
  • Open-source contributions
  • Bug reports and feature requests

Resources


Building sustainable biomedical informatics infrastructure for interoperable and AI-assisted research.

Pinned Loading

  1. convert-pheno-ui convert-pheno-ui Public

    The web UI for Convert-Pheno, a software toolkit for the interconversion of standard data models for phenotypic data

    FreeMarker 5 1

  2. pheno-ranker-ui pheno-ranker-ui Public

    The web ui (R-Shiny application) for Pheno-Ranker, a tool designed for performing semantic similarity analysis on phenotypic data structured in JSON format, such as Beacon v2 Models or Phenopackets v2

    R 2

  3. beacon2-cbi-tools beacon2-cbi-tools Public

    Beacon v2 - CNAG Biomedical Informatics - Tools (Data ingestion tools)

    Perl 3

  4. clarid-tools clarid-tools Public

    ClarID: A Human-Readable and Compact Identifier Specification for Biomedical Metadata Integration

    Perl 2

  5. cbicall cbicall Public

    CBIcall is a configuration-driven framework for reproducible variant calling in large sequencing cohorts, enabling standardized pipelines from FASTQ to analysis-ready VCFs across heterogeneous comp…

    Python 2

Repositories

Showing 10 of 16 repositories

Top languages

Loading…

Most used topics

Loading…