Language and Numbers: Evaluation for Patient Engagement
How do you evaluate whether message to a patient actually works? It depends entirely on what you mean by "works." Here's the same message, seen through three different lenses.
One message. Three ways to evaluate it.
The same patient message, seen through different lenses. What you measure determines what you see.
We built this because we keep seeing the same gap in our work.
A team sends a message like this to thousands of patients. The dashboard says it worked: delivered, opened, positive sentiment, appropriate reading level, green across the board. But completion rates don't move. Patients no-show. Procedures get cancelled and never rescheduled. And nobody can explain the discrepancy between what the metrics reported and what actually happened.
The message was grammatically clean, clinically accurate, and on-brand. It was also written for someone who had done this before. For Maria, a first-time patient who primarily speaks Spanish, it created more questions than it answered. She opened the message, didn't fully understand it, and ate breakfast the next morning because she didn't realize the clear liquid diet started when she woke up. Her colonoscopy was cancelled due to inadequate prep. She didn't reschedule.
Clinicians are usually good at catching this kind of problem before it ships. They can feel when a message won't land, even if they can't always name exactly why. That's the language side of evaluation: you're assessing whether the output actually does what it needs to do. Not just whether it's accurate, but whether it works for this particular person in this particular moment.
The numbers side looks completely different. The dashboard saw a successful interaction. Delivered, opened in four minutes, positive sentiment. By every metric the system tracks, this message performed. If you're only looking at the numbers, you'd ship this to every patient in the system and move on.
Cole-Lewis, Ezeanochie, and Turgiss have a useful framework for understanding why these two pictures diverge. They distinguish between "Little e" engagement, which is interaction with the digital tool itself, and "Big E" engagement, which is whether the patient's behavior actually changed. The numbers were measuring Little e. Nobody was measuring Big E. And the language evaluation, the qualitative read that could have caught the problem, never made it into the system.
This is where most teams get stuck. They're strong on one side and blind on the other. Either they're tracking metrics obsessively but can't explain what's actually happening on the ground, or they're relying on good clinical instincts but can't defend their conclusions when it matters.
Language without numbers gives you diagnosis with no treatment. Numbers without language gives you measurement of the wrong thing.
AI makes this sharper than it used to be. When a human writes a bad patient message, you can coach them. When a system produces bad output at scale, you need infrastructure that catches it. That means holding both at the same time: the qualitative judgment that tells you whether something lands, and the operational measurement that tells you how often it's failing and whether your fix actually worked.
The teams that get this right aren't the ones choosing between better content and better data. They're the ones that can hold both: evaluate whether the output actually works, then build the system that produces good output reliably, with the measurement to know when it starts to drift.
If you want to feel what that evaluation process looks like in practice, try walking through the diagnostic below. It uses the same scenario and three steps from the framework we use with clients.
Try it yourself.
Three steps from our evaluation framework, applied to Maria's message. Each step asks you to think with both the language and the numbers. Think first, then see what we see.