Yingzi Ye

Healthcare as a Noisy Network

Before data enters an EHR, it travels through a noisy, lossy human system. Understanding this network is foundational to understanding healthcare AI.

In mathematics, noise is an undesirable disturbance that obscures a true signal. In healthcare, noise is more complicated:

Noise is created by systems, people, language, workflow, and interpretation.

During my mother’s six-year treatment journey, I slowly realized that healthcare is not a pipeline. It is a noisy, multi-agent, semi-coordinated network.

This network shapes everything downstream—diagnoses, notes, billing codes, population models, and AI performance.

1. The network is made of people with constraints

The first insight I learned: every node in the healthcare system transforms information.

Nodes include:

• patient • family member • interpreter • triage nurse • specialist nurse • physician • surgeon • resident • scheduler • EHR template

Each node has:

• different incentives • different training • different language comfort • different time pressure • different ways of filtering information

A patient’s story is never simply “captured.” It is **transformed through constraints**.

2. Noise accumulates, it doesn’t cancel

In engineering, noise sometimes cancels out across repeated samples. In healthcare, noise tends to accumulate:

• semantic simplification • dropped details • timeline shifts • altered phrasing • incorrect assumptions • emotional context removed

In multilingual settings, this effect magnifies. My mother often described symptoms metaphorically—interpreters removed metaphors. Nurses removed uncertainty. Physicians reframed the simplified version into medical language.

The final note often had only a faint resemblance to the original story.

3. Bottlenecks amplify information distortion

Three bottlenecks consistently created the most noise:

• triage • interpreter translation • template-based documentation

These bottlenecks act as compression points. Once context is lost, it is never recovered.

4. Information flows more like a stochastic process

If I map healthcare information flow mathematically, it resembles:

a stochastic, multi-agent Markov process with variable transitions.

Each clinician visit is a random draw from:

• clinician style • documentation level • interpreter accuracy • patient clarity • system constraints

This explains why:

• EHR notes vary wildly • diagnosis quality varies • symptom timelines change • data looks “messy”

The “mess” is structural, not accidental.

5. Why this matters for my research

Many health data science papers treat EHR data as:

imperfect but usable signals

But my lived experience shows:

EHR data is a final snapshot of a long, noisy, lossy process.

ML models built on EHR must account for:

• upstream noise • cross-cultural variation • uncertainty removal • interpreter bias • workflow constraints

This is why I am drawn to studying:

• multilingual health representation • clinical documentation behavior • information propagation in workflows • upstream data quality

Because building better models requires understanding the network that creates the data.