Why Your Healthcare Data Platform Is Still Fragmented (And How to Fix It)

I've modernized data platforms for telehealth organizations, payers, and health systems. The starting point is always the same: fragmented systems, legacy ETL pipelines, and teams frustrated by slow analytics and inconsistent data.

The path forward isn't another data warehouse or hiring more data engineers. It's about rethinking your architecture from first principles.

The Fragmentation Problem

Most healthcare organizations have:

Multiple EHRs across different facilities or acquisitions
Business systems (Salesforce, ADP, billing platforms) that don't talk to each other
Claims data in one place, clinical data in another, quality measures somewhere else
Legacy ETL jobs that break when source systems change
No single source of truth for patient longitudinal records

Sound familiar? This isn't a technology problem. It's an architecture problem.

The Lakehouse + Medallion Approach

Here's what actually works: combine the flexibility of a data lake with the structure of a data warehouse using a Lakehouse architecture with Medallion layers.

Bronze Layer (Raw Data)

Ingest everything as-is from source systems
No transformations, just schema validation
High-velocity ingestion with Kafka, Kinesis, or Pub/Sub
Store in Parquet or Delta format for performance

Silver Layer (Cleaned & Standardized)

Transform to FHIR-based patient models
Apply data quality rules and deduplication
Create longitudinal patient records across systems
Use dbt for declarative transformations

Gold Layer (Business Logic)

Population health cohorts
HEDIS measure calculations
Quality metrics (PHQ-9, GAD-7, A1c, BP)
Risk stratification models

This isn't theoretical. We built this for Defense Health Agency's telehealth operations on AWS GovCloud — processing patient encounters, claims, and clinical program data hourly.

Why FHIR Matters (Even If You're Not Exchanging Data)

FHIR isn't just for interoperability. It's the best framework for building a unified patient data model across fragmented systems.

Here's why:

Semantic interoperability — FHIR resources (Patient, Encounter, Observation, Condition) provide a common vocabulary across EHRs
Longitudinal patient records — Bundle all patient data (demographics, encounters, labs, meds, diagnoses) in a standardized format
Extension support — Add custom fields for payer-specific or clinical program data without breaking the standard
Future-proofing — When CMS mandates FHIR API access, you're already compliant

We mapped EHR data, claims, and clinical programs to FHIR Patient, Encounter, Claim, and Observation resources. This gave us a single, queryable patient view that worked across all source systems.

Real-Time vs. Batch: When Each Makes Sense

Not everything needs to be real-time. Here's when to use each:

Real-time streaming (Kafka, Pub/Sub, Kinesis)

Patient encounter events (admissions, discharges, transfers)
Clinical alerts and notifications
Real-time dashboards for operational monitoring

Batch processing (Airflow, dbt, Glue)

HEDIS measure calculations (quarterly or annual)
Historical claims data loads
Population health cohort analysis
Quality metric aggregations

We run hourly batch pipelines for most workloads. Real-time adds complexity and cost — only use it when latency truly matters.

The Right Tech Stack

Here's what we've used successfully across multiple healthcare platforms:

Cloud platforms:

GCP (BigQuery, Pub/Sub, Dataflow) for most use cases
AWS GovCloud (Redshift, Kinesis, Glue, Step Functions) for FedRAMP compliance
Databricks for lakehouse architecture + ML workflows

Orchestration & transformation:

Airflow for workflow orchestration
dbt for SQL-based transformations
Great Expectations or dbt tests for data quality

BI & analytics:

Looker for self-service analytics
Semantic layer (dbt metrics or LookML) for consistent business logic

Governance: Not Optional

Data governance isn't a compliance checkbox. It's how you ensure your platform is actually trusted and used.

Key components:

Data lineage — Track data flow from source systems to analytics (we use dbt docs + metadata APIs)
Quality monitoring — Automated checks on completeness, accuracy, and freshness
Access controls — Row-level security for multi-tenant environments
Audit logging — Who accessed what data, when, and why (HIPAA requirement)

The 90-Day Roadmap

Here's how to get from fragmented chaos to a working modern platform in 90 days:

Weeks 1-2: Assessment

Map all source systems and data flows
Identify critical use cases (population health, quality measures, risk adjustment)
Choose cloud platform and tech stack

Weeks 3-6: Foundation

Set up Bronze layer ingestion for 2-3 critical sources
Build Silver layer FHIR transformations for Patient + Encounter
Implement basic data quality checks

Weeks 7-10: Gold Layer & Analytics

Build 1-2 critical Gold layer use cases (e.g., active patient cohorts)
Set up Looker dashboards
Implement governance and access controls

Weeks 11-12: Production Hardening

Add monitoring and alerting
Document data models and lineage
Train users and hand off to internal teams

The Bottom Line

Fragmented data platforms aren't a technology problem. They're an architecture problem.

Stop adding more ETL jobs. Stop building more one-off integrations. Build a proper Lakehouse with Medallion layers, use FHIR for patient data models, and invest in governance.

The organizations that succeed are the ones who treat data platforms as a strategic enabler, not an IT project.

Why Your Healthcare Data Platform Is Still Fragmented (And How to Fix It)

The Fragmentation Problem

The Lakehouse + Medallion Approach

Why FHIR Matters (Even If You're Not Exchanging Data)

Real-Time vs. Batch: When Each Makes Sense

The Right Tech Stack

Governance: Not Optional

The 90-Day Roadmap

The Bottom Line

Related Resources

Data Platform Modernization →

FHIR Data Platform →

FedRAMP Data Platform →

Building Healthcare AI Agents →