Yingzi Ye

Why Information Structure Matters More Than the Model

Models are visible, prestigious, and publishable. But the quiet decisions about how information is structured often matter more than the choice of algorithm.

In applied mathematics and data science, we spend a lot of time thinking about:

• model choice • optimization methods • regularization • performance metrics

But my experience in healthcare—especially as a caregiver for my mother—keeps pulling me back to a simpler, more fundamental question:

What exactly are we modeling?

If the underlying information is poorly structured, no model choice can fix that.

1. When the record is the reality

In modern healthcare, the EHR does not just record care. It shapes it.

• If a symptom is not documented, it often does not “exist” for future clinicians. • If a timeline is documented incorrectly, the history is rewritten. • If pain is described vaguely, its severity is downgraded in practice.

By the time data scientists see EHR data, it has already passed through:

• multiple people • multiple translations • multiple workflows • multiple simplification steps

A model trained on this data is learning from a layered, transformed, partial reality.

2. Structure is where inequity hides

Many inequities in health data are not just about sample size or missing values. They are about:

• how fields are defined • which options exist in a dropdown • which languages are supported • which symptoms are easy to document • which ones require extra work

Multilingual and low-resource patients often appear as:

• “incomplete records” • “high missingness” • “irregular follow-up”

But these labels often reflect structural barriers:

• interpreter quality • clinic time constraints • cultural mismatch • poor interface design

The structure of the information system encodes inequality before any model is trained.

3. What “better models” cannot do

A more sophisticated model can:

• fit patterns more flexibly • handle nonlinearity • work with high-dimensional input

But it cannot:

• recover details never documented • infer nuance that translation erased • reconstruct timelines that were mis-entered • correct for systematically biased workflows without careful design

In other words:

Models amplify the structure they are given. They do not repair it.

4. Information structure as a design space

This is why I am drawn to questions such as:

• How should symptom fields be structured to reduce ambiguity? • How can we better capture uncertainty instead of forcing false precision? • How might we represent multilingual history in a way that preserves nuance? • Where should free text vs. structured fields be used?

These are design questions, not just engineering ones.

In my tutoring, designing flashcards taught me that representation can transform understanding. In healthcare, I see the same pattern at a higher stakes level.

5. A research identity centered on structure

My emerging research identity is less about inventing the most complex model and more about:

• understanding documentation variability • tracing information flow through clinical workflows • studying multilingual data inequity • designing clearer, more humane information structures

I want to work in the space where:

• clinical practice • information design • data science • equity

meet.

Because the structure of information determines:

• what gets measured • who gets represented • which questions can be asked • which answers seem valid

And in that sense, information structure is not just a technical detail. It is a quiet, powerful form of decision-making.

← Back to Writing & Notes