Clinical documentation variability is not noise—it is structure. And structure determines how a patient is represented in data.
During the years I accompanied my mother through treatment, I noticed something that felt small at first but grew into a central research question for me: the same patient looks completely different depending on who writes the note.
It is the same human body, the same symptoms, the same story—yet the EHR record changes shape when the narrator changes. Some clinicians write in dense detail; others write minimal summaries. Some rely on shorthand; others rely on structured templates. To a data scientist, these are inconsistencies. To a caregiver, they are unsettling. To a patient, they can be dangerous.
I began to categorize what I saw:
Type A: The Over-Documenter
Writes long notes, includes history, context, conversation details, minor findings,
and personal observations. Their documentation reads like a narrative—rich but inconsistent
in structure.
Type B: The Template User
Fills in structured fields, short sentences, checkbox-heavy notes. This clinician is efficient,
predictable, and aligned with EHR structure—but sometimes lacks nuance.
Type C: The Minimalist
Writes two or three lines. Assumes continuity from previous notes. Leaves interpretation up to
the next clinician. The record is fast to read but dangerously underspecified.
These three types create three different versions of the same patient. In my mother's case, this meant:
• her pain was described in three different vocabularies • her history appeared long in one note, short in another • severity levels shifted depending on phrasing • some notes included her fears; some removed them • some notes captured our questions; others ignored them
She had not changed. The documentation style had.
When I later studied applied mathematics and data science, I learned to think about variability in terms of signal vs. noise. But clinical documentation variability is neither purely signal nor purely noise—it is a product of workflow, training, culture, and cognitive load.
• Busy clinicians write shorter notes. • Specialists document differently from primary care. • Nurses emphasize function; physicians emphasize interpretation. • New clinicians follow templates; experienced ones write freely.
These patterns are not errors. They are behaviors embedded in the system.
Most clinical decisions rely on the EHR—not the patient’s memory, not the doctor’s memory, not even exam-room conversation.
EHR documentation becomes the patient. And inconsistent documentation becomes inconsistent patients.
For a multilingual caregiver, the mismatch was obvious. For a model trained on this data, the mismatch becomes invisible—and catastrophic.
If one clinician documents “mild tenderness” and another documents “moderate chronic pain,” the patient appears different in quantitative models. If one clinician includes symptoms in detail and another leaves them out, severity distributions shift artificially.
Clinical variability becomes statistical variability. Statistical variability becomes prediction error.
This is why I am drawn to:
• documentation behavior • representation structure • workflow-shaped data variation • clinical narrative compression
When I look at EHR data, I no longer see a dataset. I see the choices, pressures, and constraints of the people who created it.
The more time I spent caring for my mother, the more I understood something essential:
A patient is consistent. Their records are not.
This realization is why I focus on upstream documentation and not only downstream models. Because the same patient becoming three different records is not a trivial discrepancy. It is an information problem that affects care, equity, and outcomes.