Why multilingual patients are often represented inaccurately in EHRs—and how meaning shifts long before data becomes data.
As a caregiver-interpreter for my mother over six years, I encountered a persistent and deeply structural issue: the subtle—but clinically significant—ways that symptoms change meaning as they move from Chinese into English.
This is not simply about vocabulary. It is about semantic drift—a shift in meaning caused by translation, cultural framing, and interpreter behavior. And in clinical care, semantic drift becomes data drift.
Chinese contains dozens of nuanced pain descriptions:
• 刺痛 – sharp, stabbing, needle-like
• 隐隐作痛 – dull, lingering, hard to localize
• 胀痛 – pressure-like, swelling pain
• 酸痛 – soreness combined with heaviness
• 麻木 – numbness, tingling, altered sensation
• 灼热感 – burning, heat-like discomfort
In English, these are often collapsed into:
• “sharp pain”
• “dull pain”
• “pressure”
• “numbness”
The problem is not only linguistic. It is representational. **A symptom with 20 possible Chinese interpretations becomes 4 English categories.**
Interpreter behavior varies:
• Some simplify for efficiency. • Some filter through their own assumptions. • Some adjust wording to match what they think the doctor expects. • Some compress culturally specific expressions into generic English.
My mother frequently described sensations in ways that made perfect sense in Chinese but were nearly impossible to map cleanly into English. I often had to interrupt the interpreter and re-explain:
“She didn’t mean ‘sharp pain’—she meant a pressure-like swelling that comes and goes.”
In the EHR, what ends up recorded is not my mother’s experience—it is the interpreter’s interpretation of her experience.
This creates a second-layer representation problem:
Patient → Interpreter → Clinician → EHR → Data
At each step, information is compressed, filtered, or altered.
Chinese patients often express symptoms through metaphor:
• “像有东西卡住一样” (feels like something is stuck)
• “像被绳子勒住” (like being tightened by a rope)
• “像有风在里面转” (a sensation of wind moving inside)
• “闷闷的” (a suffocating, heavy discomfort)
These metaphors carry diagnostic relevance in Chinese medical culture. But translated literally, they can sound nonsensical. So interpreters simplify:
“She feels pressure.” “She feels discomfort.” “She feels tightness.”
Entire layers of sensory meaning are stripped away.
What begins as a linguistic shift becomes a structural shift in data:
• A Chinese descriptor becomes an English approximation.
• The approximation becomes the clinician’s note.
• The clinician’s note becomes the EHR field.
• The EHR field becomes model input.
• The model learns trends that never existed.
This is why multilingual data inequity is not a “language problem.” It is a data representation problem.
I am especially interested in:
• multilingual symptom representation
• semantic drift across documentation layers
• interpreter-driven data distortion
• EHR note variability for non-English speakers
• equity issues in real-world evidence
When multilingual patients appear in datasets as “inconsistent,” “noisy,” or “low-fidelity,” these labels are not describing the patient. They are describing the system’s inability to represent them accurately.
Understanding semantic drift is essential to building equitable health data systems. Because meaning—and therefore care—can be lost long before data is ever analyzed.