I've modernized data platforms for telehealth organizations, payers, and health systems. The starting point is always the same: fragmented systems, legacy ETL pipelines, and teams frustrated by slow analytics and inconsistent data.
The path forward isn't another data warehouse or hiring more data engineers. It's about rethinking your architecture from first principles.
The Fragmentation Problem
Most healthcare organizations have:
- Multiple EHRs across different facilities or acquisitions
- Business systems (Salesforce, ADP, billing platforms) that don't talk to each other
- Claims data in one place, clinical data in another, quality measures somewhere else
- Legacy ETL jobs that break when source systems change
- No single source of truth for patient longitudinal records
Sound familiar? This isn't a technology problem. It's an architecture problem.
The Lakehouse + Medallion Approach
Here's what actually works: combine the flexibility of a data lake with the structure of a data warehouse using a Lakehouse architecture with Medallion layers.
Bronze Layer (Raw Data)
- Ingest everything as-is from source systems
- No transformations, just schema validation
- High-velocity ingestion with Kafka, Kinesis, or Pub/Sub
- Store in Parquet or Delta format for performance
Silver Layer (Cleaned & Standardized)
- Transform to FHIR-based patient models
- Apply data quality rules and deduplication
- Create longitudinal patient records across systems
- Use dbt for declarative transformations
Gold Layer (Business Logic)
- Population health cohorts
- HEDIS measure calculations
- Quality metrics (PHQ-9, GAD-7, A1c, BP)
- Risk stratification models
This isn't theoretical. We built this for Defense Health Agency's telehealth operations on AWS GovCloud — processing patient encounters, claims, and clinical program data hourly.
Why FHIR Matters (Even If You're Not Exchanging Data)
FHIR isn't just for interoperability. It's the best framework for building a unified patient data model across fragmented systems.
Here's why:
- Semantic interoperability — FHIR resources (Patient, Encounter, Observation, Condition) provide a common vocabulary across EHRs
- Longitudinal patient records — Bundle all patient data (demographics, encounters, labs, meds, diagnoses) in a standardized format
- Extension support — Add custom fields for payer-specific or clinical program data without breaking the standard
- Future-proofing — When CMS mandates FHIR API access, you're already compliant
We mapped EHR data, claims, and clinical programs to FHIR Patient, Encounter, Claim, and Observation resources. This gave us a single, queryable patient view that worked across all source systems.
Real-Time vs. Batch: When Each Makes Sense
Not everything needs to be real-time. Here's when to use each:
Real-time streaming (Kafka, Pub/Sub, Kinesis)
- Patient encounter events (admissions, discharges, transfers)
- Clinical alerts and notifications
- Real-time dashboards for operational monitoring
Batch processing (Airflow, dbt, Glue)
- HEDIS measure calculations (quarterly or annual)
- Historical claims data loads
- Population health cohort analysis
- Quality metric aggregations
We run hourly batch pipelines for most workloads. Real-time adds complexity and cost — only use it when latency truly matters.
The Right Tech Stack
Here's what we've used successfully across multiple healthcare platforms:
Cloud platforms:
- GCP (BigQuery, Pub/Sub, Dataflow) for most use cases
- AWS GovCloud (Redshift, Kinesis, Glue, Step Functions) for FedRAMP compliance
- Databricks for lakehouse architecture + ML workflows
Orchestration & transformation:
- Airflow for workflow orchestration
- dbt for SQL-based transformations
- Great Expectations or dbt tests for data quality
BI & analytics:
- Looker for self-service analytics
- Semantic layer (dbt metrics or LookML) for consistent business logic
Governance: Not Optional
Data governance isn't a compliance checkbox. It's how you ensure your platform is actually trusted and used.
Key components:
- Data lineage — Track data flow from source systems to analytics (we use dbt docs + metadata APIs)
- Quality monitoring — Automated checks on completeness, accuracy, and freshness
- Access controls — Row-level security for multi-tenant environments
- Audit logging — Who accessed what data, when, and why (HIPAA requirement)
The 90-Day Roadmap
Here's how to get from fragmented chaos to a working modern platform in 90 days:
Weeks 1-2: Assessment
- Map all source systems and data flows
- Identify critical use cases (population health, quality measures, risk adjustment)
- Choose cloud platform and tech stack
Weeks 3-6: Foundation
- Set up Bronze layer ingestion for 2-3 critical sources
- Build Silver layer FHIR transformations for Patient + Encounter
- Implement basic data quality checks
Weeks 7-10: Gold Layer & Analytics
- Build 1-2 critical Gold layer use cases (e.g., active patient cohorts)
- Set up Looker dashboards
- Implement governance and access controls
Weeks 11-12: Production Hardening
- Add monitoring and alerting
- Document data models and lineage
- Train users and hand off to internal teams
The Bottom Line
Fragmented data platforms aren't a technology problem. They're an architecture problem.
Stop adding more ETL jobs. Stop building more one-off integrations. Build a proper Lakehouse with Medallion layers, use FHIR for patient data models, and invest in governance.
The organizations that succeed are the ones who treat data platforms as a strategic enabler, not an IT project.
Related Resources
Data Platform Modernization →
Explore our service for modernizing healthcare data platforms with Lakehouse architecture and FHIR standards.
FHIR Data Platform →
See how we built a modern telehealth data platform using FHIR standards and Lakehouse architecture.
FedRAMP Data Platform →
Learn about our HIPAA and FedRAMP compliant platform built on AWS GovCloud for Defense Health Agency.
Building Healthcare AI Agents →
Discover how modern data platforms enable production-ready AI systems with RAG architecture.
Need help modernizing your healthcare data platform? Let's discuss your roadmap.