Multi-agent clinical-data platform

Turn clinical records into queryable, source-backed data.

Salutera uses specialist agents for NLP, Vision, Speech, Reasoning, and Provenance to read notes, PDFs, scans, pathology reports, imaging text, and transcripts. The platform structures every extraction into clinical ontologies and links each answer back to the exact source document, page, and line.

On-prem · VPC · air-gapped · zero data egress

Validated across

16Diseases · 7 categories
781Variables · 10 groups
112KFiles · 16,000 patients
// Salutera WorkspaceLive · Verified

// Clinical query

“Show me NSCLC patients with PD-L1 ≥ 50% on first-line immunotherapy in 2024.”

1,243 patients matched · 6 sites · 0.42s

// Specialist agents · in parallel

NLP· NOTES
VISION· IMAGES
SPEECH· AUDIO
REASONING· COHORT
PROVENANCE· SOURCES

// Cohort extraction matrix

FHIR · OMOP · CSV

PatientTNMBiomarkerLineLast visit
P-1042T3N1M0ALK+1L2024-07-30
P-1043T1N0M0PD-L1 ≥50%imm2024-09-04

1,243 records matched · showing 2 · every cell source-traceable

// Provenance · P-1043 · biomarker

Grounded

“Tumor immunohistochemistry showed PD-L1 expression 62% (Tumor Proportion Score), consistent with high-expressor classification.”

oncology_consult_240904.pdf · pg 3 · line 27

The product in action

A patient's full clinical story.

Each event below was extracted from a different source document — biopsy report, pathology consult, oncology note, imaging report — and grounded against SNOMED-CT, LOINC, RxNorm. One row per encounter; one citation per claim; one audit entry per extraction.

Patient timeline · P-1043extracted from 11 source records
  1. 2023-11DxNSCLC · biopsy-confirmed adenocarcinoma · stage T2N0M0
  2. 2023-12BiomarkerEGFR wild-type · ALK negative · PD-L1 TPS 62%
  3. 2024-01Tx-1LPembrolizumab monotherapy initiated · 200mg q3w
  4. 2024-04ResponsePartial response · target lesion -38%
  5. 2024-07Tx-contContinued pembrolizumab · no new toxicities
  6. 2024-09ImagingStable disease · CT chest + PET
Audit log · last 60stamper-evident · SIEM-ready
  • 12:04:18extractagent.nlpdischarge_summary_240821.pdf→ 27 variables · grounded
  • 12:04:18extractagent.cvct_chest_240821.dcm→ 4 measurements · grounded
  • 12:04:19fuseagent.mmencounter:240821-103→ 1 row · ontology-validated
  • 12:04:21queryuser:scarter@…cohort:nsclc-pdl1-1L→ 1,243 matches · cited
[OUTPUT // SCHEMA]

Unified Patient Record

Outputs are instantly available as structured JSON arrays, FHIR resources, or tabular exports. Every variable is cryptographically tied to its source PDF coordinates.

fhir_export_bundle.jsonactive
{
"resourceType": "Bundle",
"id": "fused-patient-p1043",
"entry": [ { "resource": { "id": "T3N1M0", "confidence": 0.991 } } ]
}
2.4s
Avg Extraction
99.1%
Accuracy

The data problem

Information is trapped.

Locked in PDFs, notes, and scans.

Clinical data sits in PDFs, handwritten notes, faxed scans, and fragmented systems. Teams waste weeks manually searching, abstracting, validating, and rechecking the same records a third time.

Manual extraction fails

Keyword search misses context. Manual abstraction is painfully slow, inconsistent across abstractors, and nearly impossible to scale across large disease cohorts.

AI without citations is dangerous

Salutera reads what those records actually say — and cites the exact bounding box in the source PDF for every single claim it returns. Hallucinations are structurally impossible when every output must be grounded.

How it works

Structure. Search. Reason. Cite.

Eight pipeline stages. Four specialist extraction agents (NLP · Computer Vision · Speech · Multimodal). An extensible reasoning layer on top. Every cell, every claim, traces back to its source document.

// ExtractionFour specialist agents work every record in parallel
// RECORDS IN
Input Documents
PDF · DICOM · HL7 · Audio · Web Scraped
01Router
Intelligent Dispatcher
Dispatches segments to optimal downstream specialized AI agents.
// PARALLEL PROCESSING AGENTS
NLP
Reading agent
notes · summaries · pathology
CV
Looking agent
imaging · ECG · scanned forms
SPK
Listening agent
dictations · consultations
MM
Fusing agent
multimodal records
02Stage
Ontology Mapping
LOINC · SNOMED-CT · RxNorm standards mapping.
03Stage
Vector Embeddings
Encodes semantic context for multi-agent reasoning.
04Stage
Classification
High-precision categorization & diagnostic validation.
// STRUCTURED OUTPUT
Mega-Structured Dataset

Instantly available formats:

  • Relational Tables & JSON
  • FHIR R4 Resources
  • OMOP CDM Mappings
100% Traceable provenance per cell
// ReasoningAgents that work the structured output
// AGENT_01

Cohort comparison

Apples-to-apples across sites.

// AGENT_02

Eligibility screening

Trial criteria, per patient.

// AGENT_03

Signal detection

Adverse-event scanning.

// AGENT_04

Decision support

Cited answers per question.

// AGENT_05

Formulation & CMC

Pharma R&D precedent.

// AGENT_06

Custom agents

Bring your own rules.

The Processing Core

08 Stages Clinical Pipeline

End-to-end parallel multi-agent processing, clinical ontology mapping, and de-identified mega-structure outputs designed for infinite scalability.

01

Data Anonymization

Local compliance de-identification (HIPAA/GDPR).

02

High-Perf Storage

Scalable cluster indexing raw multimodal formats.

03

Intelligent Routing

Dispatches content to optimal downstream agents.

04

Parallel Processing

NLP, CV, and Speech agents fuse all claims simultaneously.

05

Ontology Mapping

Forces terminology to match LOINC, SNOMED, RxNorm.

06

Vector Embeddings

Encodes semantic relationships for reasoning models.

07

Variable Extraction

Pulls exact patient properties with absolute traceability.

08

Mega-Structure Output

Generates typed tables, FHIR bundles, and knowledge graphs.

Semantic Clinical Query

Search across records. Returns cited patient cohorts, not document links.

Longitudinal Timeline

Aligns diagnoses, biomarkers, and visits sequentially per patient.

Registry Extraction

Pre-fills oncology (NCDB) and cardiovascular (STS) fields directly.

Traceable Auditing

Every cell includes a one-click provenance jump to the source offset.

Zero-Trust Local Engine

Strips PII locally, preserving 100% privacy constraints under HIPAA.

Standardized Ontologies

Grounds messy free-text into LOINC, SNOMED-CT, RxNorm schemas.

Powered by Salutera

Unified Clinical Data Platform

Connect vastly fragmented medical systems and unstructured data formats directly to our secure, high-precision AI reasoning core.

STEP 01

Clinical data mega-structures of specific institutions into AI-ready datasets

99% accurately mega-structures clinical data from vastly fragmented sources

SystemData Type
HIS
EHR IntegrationCerner, Epic, MEDITECH, and more.
EHR/EMR
FHIR DataFHIR R4 resources.
DUR
Unstructured DataData as different formats: PDF, imaging, DICOM, etc.
PAC/RIS
Unstructured DataData in forms of particular schema, e.g. table
LIS/LABS
Unstructured DataPDF, .txt, imaging, DICOM, etc.
CRM
Structured DataData in schema-based forms like tables.
STEP 02

Real-time AI decision support and scalable insights

Innovative Salutera Algorithms

GenomicsVitalsHistoryImagingLabsPathologyAI

Multimodal AI/ML cross-talks medical variables of billions of data points for precision health.

STEP 03

Fully secured and easy to use for non-IT professionals

WEB and Hand-held apps

Our platform helps doctors identify the most appropriate approved treatments. Fully secure and user-friendly.

What users can do

Ask clinical questions. Build cohorts.

MODULE_01 // NL_QUERY

Ask clinical questions

Semantic search across the corpus. Returns matching patients, not just documents.

salutera query engine
PD-L1 >= 50% AND NSCLC
Matched patients1,847
Indexed 112,711 files · verified
MODULE_02 // COHORT

Build precise cohorts

Inclusion + exclusion logic in clinician language with clear audit trails.

Cohort BuilderLIVE
PD-L1 high
&
1L regimen
=
1.2K Pts
MODULE_03 // ABSTRACT

Extract variables

Pre-fill NCDB, STS, NSQIP registries directly from underlying records.

FieldValueSource
Stage_TNMT2N0M0consult.pdf
Gleason7 (3+4)path_rpt.pdf
PSA_ng/mL6.8labs_q3.pdf
MODULE_04 // TIMELINE

Create timelines

Diagnosis, biomarker, and treatment assembled into one longitudinal view.

Patient Timeline
DxNov '23
BxJan '24
TxMar '24
Sep '24
MODULE_05 // PROVENANCE

Trace every claim

One-click jump to the source document, page, and exact passage.

PDF
consult_240904.pdf
pg.4 · line 27–31
View
Citation coverage100%
MODULE_06 // EXPORT

Export structured data

Typed tables, FHIR bundles, OMOP CDM mappings for downstream analytics.

{ patient: "P-1043",}
FHIR R4OMOPCSV
Benchmarks & Evidence

Validation & Extraction Accuracy

Evaluated under strict exact-match criteria. Pinned manifestations ensure clinical stability across mid-sized and government air-gapped deployments.

Histopathology details
Evaluation Set
112,711
Unstructured Records
99.12%

Colorectal Cancer Registry

Histopathology Reports PACS DICOM MRI
MRI Scan
99.03%Gleason

Prostate Cancer

PSA, Gleason score, margins

COPD (Pulmonary)

98.89%

Asthma Registry

98.12%
Cohort Metrics16K SYNTHETIC
Overall Accuracy
95.79%
±5.69%

Evidence & validation details

Methodology & Framework

Structuring across 16 diseases · 7 categories on a synthetic patient corpus modeled on CDC- and NIH-sourced statistics.

// Headline benchmark on synthetic cohort

0.00%

overall accuracy

<0s

retrieval time

0

variables

0k

files processed

16,000 synthetic patients

Synthea + Qwen-2.5 enrichment. Evaluated strictly under exact-match accuracy metrics across 16 diseases including oncology, respiratory, immunology, and neurology.

Deployment footprint

Runs on commodity infrastructure. Scales linearly with the cluster you give it.

On-prem · VPC · Air-gapped
Zero data egress

// Grounded ontology domains & categories

ImmunizationsCodesNamesMedicationsSymptomsConditionsObservationsCare plansProceduresDevices

Security & deployment

Every claim should survive review.

Records stay in your perimeter

Deploy on-prem, in your VPC, or air-gapped. When Salutera runs in your environment, patient files never touch our infrastructure. Zero egress.

Tenant isolation

Per-customer compute, storage, and agents. No shared inference or cross-training.

BYO KMS + Encryption

AES-256 at rest, TLS 1.3 in transit. Customer-controlled KMS natively supported.

Audit Log Per Claim

Every extraction, query, and export is logged with operator, timestamp, and scope. Easily exportable to your enterprise SIEM.

Model Governance

Pinned model manifests and staged updates. Customer sign-off is strictly required for tenant-level model changes.

HIPAA BAA available GDPR DPA availableSOC 2 · in progressFull security posture
Salutera

Pilot offer

Bring us your hardest clinical dataset.

Pick the dataset that's been blocking your team. We'll structure 1,000 records inside your perimeter — or ours — in seven business days. You decide if it's good enough.