Before data enters an EHR, it travels through a noisy, lossy human system. Understanding this network is foundational to understanding healthcare AI.
In mathematics, noise is an undesirable disturbance that obscures a true signal. In healthcare, noise is more complicated:
Noise is created by systems, people, language, workflow, and interpretation.
During my mother’s six-year treatment journey, I slowly realized that healthcare is not a pipeline. It is a noisy, multi-agent, semi-coordinated network.
This network shapes everything downstream—diagnoses, notes, billing codes, population models, and AI performance.
The first insight I learned: every node in the healthcare system transforms information.
Nodes include:
• patient • family member • interpreter • triage nurse • specialist nurse • physician • surgeon • resident • scheduler • EHR template
Each node has:
• different incentives • different training • different language comfort • different time pressure • different ways of filtering information
A patient’s story is never simply “captured.” It is **transformed through constraints**.
In engineering, noise sometimes cancels out across repeated samples. In healthcare, noise tends to accumulate:
• semantic simplification • dropped details • timeline shifts • altered phrasing • incorrect assumptions • emotional context removed
In multilingual settings, this effect magnifies. My mother often described symptoms metaphorically—interpreters removed metaphors. Nurses removed uncertainty. Physicians reframed the simplified version into medical language.
The final note often had only a faint resemblance to the original story.
Three bottlenecks consistently created the most noise:
• triage • interpreter translation • template-based documentation
These bottlenecks act as compression points. Once context is lost, it is never recovered.
If I map healthcare information flow mathematically, it resembles:
a stochastic, multi-agent Markov process with variable transitions.
Each clinician visit is a random draw from:
• clinician style • documentation level • interpreter accuracy • patient clarity • system constraints
This explains why:
• EHR notes vary wildly • diagnosis quality varies • symptom timelines change • data looks “messy”
The “mess” is structural, not accidental.
Many health data science papers treat EHR data as:
imperfect but usable signals
But my lived experience shows:
EHR data is a final snapshot of a long, noisy, lossy process.
ML models built on EHR must account for:
• upstream noise • cross-cultural variation • uncertainty removal • interpreter bias • workflow constraints
This is why I am drawn to studying:
• multilingual health representation • clinical documentation behavior • information propagation in workflows • upstream data quality
Because building better models requires understanding the network that creates the data.