The Scienza Health real-world dataset is a proprietary clinical data lake of 12.3 million senior care patients across 14,613 facilities, with 27 billion clinical records spanning 10+ years longitudinally. It includes a 3-million-patient neurodegenerative cohort and 2,500+ speech biomarkers per encounter, and powers Proactive Decision Orders through real-time cohort benchmarking on the GIA® platform.
27 Billion Clinical Records.
One Powerful Data Lake.
Real-world evidence from 12.3 million senior care patients across 14,613 facilities — the proprietary dataset behind GIA® AI Co-Clinician.
Key Facts
- Patients
- 12.3M
- Facilities
- 14,613
- Records
- 27B
- Years
- 10+
- Biomarkers
- 2,500+
- Brain-health cohort
- 3M
This content is intended for informational purposes and does not constitute medical advice. Editorially reviewed by David Kaiser, CEO of Scienza Health, for accuracy in post-acute care operations.
For partners evaluating the data layer behind GIA® platform, Proactive Decision Orders, and clinical intelligence. Review our clinical governance framework, peer-reviewed research, and EHR integrations.
Every clinical signal, captured.
Medications, diagnoses, structured assessments, daily care outcomes — longitudinally linked and queryable in sub-second time.
3 Million Neurodegenerative Patients.
One of the largest real-world brain-health cohorts in healthcare — the foundation for speech biomarker validation.
Voice biomarker data — the differentiator.
Scienza Health is the only senior-care data partner with proprietary voice biomarker capture integrated directly into the longitudinal record. Each patient encounter contributes 2,500+ acoustic and linguistic features — the unstructured signal traditional EHR datasets cannot replicate.
- Speech-derived markers from natural patient conversation, captured at point of care
- Validated against peer-reviewed clinical endpoints (academic medical centers, MIT, Mayo, NIH consortium) — see research
- AUC 0.97 for Parkinson’s detection from conversational speech (peer-reviewed)
- Continuously expanding: every screening encounter adds to the longitudinal cohort
Where the signal comes from.
Structured clinical data
- EHR (PointClickCare-integrated)
- MDS — Section C (BIMS), G (functional), N (medications)
- Medication administration records
- Diagnoses (ICD-10), problem lists
- Care plans, ADL/IADL functional measures
- Daily clinical events — transfers, falls, behavioral incidents, vitals
Voice + multimodal data
- 2,500+ speech biomarkers per encounter
- Acoustic features — prosody, pitch, timing, articulation
- Linguistic features — lexical complexity, semantic coherence
- Computer vision signals (consented video encounters)
- Outcome linkage — speech features tied to clinical trajectory
1.3 billion outcomes are not all equal.
Pharma and payor research questions hinge on the right outcomes, captured with the right structure.
- Disease progression — functional decline, BIMS score change, ADL/IADL trajectories
- Adverse events — falls, behavioral incidents, medication-related events, transfers
- Hospital utilization — avoidable transfers, readmissions, length of stay
- Mortality and discharge outcomes
- Treatment response — medication initiation/titration tied to clinical trajectory
How the data powers Proactive Decision Orders.
The dataset isn’t a backend asset that sits in cold storage. It is the engine. Every new patient encounter is benchmarked, in real time, against the millions of like-cohort patients who came before — same demographics, same diagnosis history, same functional baseline, same trajectory shape. Like-cohort outcomes condition the model. The result: highly probable clinical orders before the physician walks in.
- New patient encounter → voice + structured data captured
- Cohort match → like-patients selected from 12.3M longitudinal records
- Outcome distribution → probability surface for next-best clinical actions
- Highest-probability orders surfaced to clinician with reasoning
- Clinician reviews and approves — every action gates through human judgment
- Action and outcome feed back — the loop sharpens
Without longitudinal scale, cohort matching produces noise. Without outcome linkage, probability surfaces are flat. We have both. That is why Proactive Decision Orders work — and why they are difficult to copy.
Without data, there is no AI.
We have the data.
- 10+ years longitudinal — trajectory, not snapshots
- 14,613 facilities across diverse settings — built-in generalizability
- 3M+ neurodegenerative patients — statistical power where it matters
- Continuously updated via native PointClickCare integration — daily, not batches
- Voice biomarker capture at point of care — uncopyable from claims data alone
Real-world evidence, every angle.
Pharma & biotech
Identify trial-eligible cohorts, generate post-market evidence, validate drug-target hypotheses against the largest senior-care neurodegenerative cohort in production today.
Talk to RWE team →Payors & Medicare Advantage
Risk stratification at the patient level, HCC documentation completeness, avoidable-utilization signal years before claims data surfaces it.
Talk to plan-partnerships team →Health systems & SNF operators
Benchmark outcomes against the cohort. Power your screening, documentation, and quality programs with the data layer behind GIA® AI Co-Clinician.
Talk to clinical-partnerships team →Research collaborators
Academic medical centers, NIH-funded consortia, brain-health foundations — partner with us on joint studies, data access, and longitudinal research.
Talk to research team →Enterprise-grade. Research-ready.
- HIPAA compliant. Fully de-identified.
- 5-Layer Governance. AES-256. Human-in-the-Loop.
- Continuously updated — new patient encounters and clinical events flow daily.
- 95% compression: 8TB raw → 400GB queryable. Sub-second query speed via AWS Athena.
- Python, R, and SQL compatible. 37 dimension + 23 fact tables.
- Demographically diverse cohorts; detailed breakdowns available under partnership.
See the data behind GIA®.
Decision-grade evidence in 90 days. For pharma, payors, health systems, and research collaborators.
20-minute conversation. No NDA required to start.