
Scripts & Voice User Experience improvements an Intelligent Agent in a call center
At CVS, I worked on an Intelligent Assistant that helps Aetna insurance members, find providers. The IA was launched in June. I joined the company in July. The IA currently has:
● 30.9% containment rate
● 122 intents
● 10,000 utterances
I worked in post-production, on a Training & Tuning team. My workflow consists of:
● Research: to resolved issues identified by analytics team, or by me, my research consisted of: studying Watson’s logic, analytics reports, reviewing call transcripts and audio, identifying STT issues, requesting extra analytics reports as needed.
● Implementation: once I knew what’s causing the issue, I implemented fixes in Watson, tested in Watson, then tested in a custom testing tool, then in Dev, and QA.
● Releases: coordinated overnight Production Releases on a biweekly basis, while Product Owner is testing in real-time, 15 minutes after the release, I fetched transcripts to compare with audio, write down observations, and report right back.
PROCESS OVERVIEW
Story is an issue, as defined in the Agile Development Framework. Stories are assigned to me to resolve. Some stories need Research & Implementation. Others need just implementation.
Stories come from 2 places:
From Analytics team: They identified an issue. Or they noticed a pattern and they think the pattern is causing an issue, and they want it examined.
From me: I am working on a Story that came from Analytics. And while doing my research, or implementation, I discover another, new problem. Then I open a separate Story, to resolve that new problem.
There are 2 types of Stories:
1. Script implementation only: The ones where the solution is a Script from a Voice Designer. I implement the script, create or update dialog nodes.
2. Something’s broken: This is a story where something is not working: intent, entity, dialog nodes, or conditional code. I fix it.
Once an issue is identified, a Story is created in RTC (Rational Team Concert) system, similar to Jira.
This example was a story about how to best handle callers who ask for varitations of a "general doctor" utterance.
"GENERAL"
Situation:
Decide how to best direct callers who ask for a general doctor. Voice Designer suggested disambiguation (digression in Watson), and supplied a script for it.
Task:
The suggested strategy seemed clunky to me. I thought we could provide a much simplified UX.Action:
My Role:
● I quantified numbers in the Analytics report, and extracted synonyms used.● Researched Watson for the synonyms – we did not have a single one.
● Juxtaposed existing structure and tools, plus the new synonyms, vs. the new, proposed disambiguation. Often times in UX, less is more.
● Presented to the Business.Sample size: 53 calls.
Result:
My, simpler solution, was adopted.
Here’s the journey:

Deeper dive into the Analytics report: I looked at the utterances that were causing the issue. Here are some numbers:
Out of 53 calls total:
- 33 says “general”
- 12 say “general doctor”
- 1 said “general practitioner”
- 7 said a mix of “general care, general health, general healthcare provider and general healthcare”

I checked the mapping in Watson: I then put together a list of these 6 utterances, and checked if we already have them as synonyms, in our Watson entities, mapped to a “Primary Care Provider” Entity.
● General
● General care
● General doctor
● General health
● General healthcare
● General healthcare provider
We did not have these 6 synonyms. (Below is just representation; not comprehensive research).

And this was the script supplied by the Voice Designer, to disambiguate the word “General”:
YES scenario – caller is looking for a Primary Care Provider, and they are calling it “General Practitioner”.
NO scenario – caller is NOT looking for a Primary Care Provider. My observations are in pink.

I thought the disambiguation in this case was not needed. It just introduces extraneous and arbitrary elements and interaction.
I implemented the 6 missing synonyms, and tested the UX. There were the 2 scenarios:
● Caller wants a Primary Care Physician
● Caller wants a provider other than Primary Care Physician
These were the experiences:
User: General
Intelligent Assistant: OK. Want me to look in the zip-code on file?
User: Yes
Intelligent Assistant: Here 50 PCPs…
(Yelena’s logic: in this case, we are thinking a step ahead of the user, which is preferable UX. And no extra question and confirmation).OR:
User: General
Intelligent Assistant: OK. Looking for a PCP. Want me to look in the zip-code on file?
User: Yes
Intelligent Assistant: Here 50 PCPs…
User: I don’t want a PCP…
Intelligent Assistant: Alright. First let's determine exactly what type of doctor you're looking for. You can ask me to find you a specialist like "pediatrician", "allergist", and "urologist." Or to help with a specific medical concern like "sore throat", "heart disease", or "foot pain.
User: Woman's DoctorIntelligent Assistant: I can help you with that. Looking for Obstetrics and Gynecology near the zip code on file. Okay, I’ve found 50 providers for Obstetrics and Gynecology for you near the zip code on file. I’ll read the results in groups of three. If you’d like to hear more information about one of these providers, just say first provider, second provider, or third provider, or I can read you the next set of healthcare providers. Let me know when you’re ready.
I presented my case in the Business, and my simplified solution was approved.
The existing recovery response is better, because:
- It only appears if it’s needed. Progressive disclosure, more favorable UX.
- It gives examples of what the caller may ask, which is informative, practical and helpful.
- Examples also reduce cognitive load – there’s no figuring out what and how to say next.
So no further implementation, and changes were necessary. The business agreed with my assessment, and that was what I implemented.
CAN'T YET:
Situation: Instances where Intelligent Assistant says that it “can’t yet” do something, were to be replaced with smoother verbiage.
Task: My task was to implement simple text edits. Or so it would seem. It turned out to be a lot more complicated.
Action: two different Voice Designers attempted to write the new scripts. None worked. I ended up finalizing the script, to accommodate all conditions that needed to be considered. Plus, upon implementation, testing proved to be rather tricky. Below I outline my journey and methods I applied.
Sample size: 15 calls.
Situation: My script was implemented and worked successfully.
RTC (Rational Team Concert) story, contains the title, the objective and Analytics report.

Analytics report has gone through an audio splitter, and it did not give offer me much insight.

The first Voice Designer replaced instances where the our Intelligent Assistant says that it “can’t yet” do something, with basically, less words. My comments are in pink.

The second Voice Designer replaced instances where the our Intelligent Assistant says that it “can’t yet” do something, with more polished verbiage, and eliminated the double question.
Close but no cigar.
My comments are in pink.

Below is the comprehensive node logic. What the second Voice Designer put together, was opposite from how the node logic worked. We were not in a position to change node logic, because the change would affect too many other nodes. We only changed node logic when
THE GAIN WAS BIGGER THAN THE PAIN
Watson node structure
“Process Flag: Usupported Search Parameter” flag parent node.
“Evaluate Unsupported Search Parameter” is name of a child node. So changing it would have immeasurable consequences. When faced with changing node logic, the T&T team does very careful PROS and CONS analysis.
If user says any CSR utterance or NO, the following intents are triggered:#no_response, #eleganttransfer_request_agent (70010) and user hears:
“Certainly. I will connect you with an Aetna representative.”If user says CONTINUE SEARCH or YES, the following intents are triggered: #refine_continue_search, #general_continue, #yes_response and user search is continued without the elements Bot is unable to search on. For example, for “women’s wellness checkup”, it went on to search “check up”, in the testing for this story.

I needed to be creative, and improve UX, while I was under constraints. Here was my script.

This script worked well:
● I replaced “can’t yet” words with more polished ones.
● I eliminated the complicated and confusing double question, and cognitive load that came with it.
● I worked within existing node logic.
Next step was to create a testing script, for myself, for the QA team, and for the UAT team. This proved rather difficult. Triggering these nodes did not always work as expected. Some behaved more consistently than others.
Analytics report did not provide exact utterances. Probably because it’s been through an audio splitter, and the utterances that triggered the nodes, may not have been captured properly.
Here was my journey, of trial and error, to create the right strategy, to capture the right responses.

I had to do research, and be resourceful, to create an effective testing script. These are the problems I was facing:
1. Process Flag: Specific Provider nodes, #1 and #2, should react to searching for doctor by gender or name. But they did not. Besides, apparently there are cities and towns, named by common American names:
Smith (Nevada, Indiana)
Johnson (Arkansas, New York, Nebraska, Kansas)
Justin (Texas)
Michael (Illinois)
Anthony (Kansas, Texas, New Mexico, Florida)
Allen (at least 10 states…)
So I started using somewhat ethnic names. That worked.2. Mailed results node, #3. While testing in Watson “Try it out” panel, or the custom testing tool that is also text-based, the word “mail” was triggering the intent. However, when I assigned the ticket over to the QA team, and they tested on the phone.
Utterance “mail” was transcribed as “male”. As in, male vs. female.
This is why testing is important. And this is how voice agents are different than chatbots.
So I went into this intent, in Watson, and copied ALL utterances from this intent. So that the tester has options. We only needed to trigger this node once. That worked.3. Process Flag: Evaluate Unsupported Search Parameter nodes, #4 and #5. Searching by any parameter we still don’t support. I tried everything that should trigger this intent: the incomplete zip-code, provider’s race, language provider speaks…Nothing worked. So I went and listened to audio recordings, from the Analytics report, that this story was based on. In there, the @unsupported_search_parameter entity was triggered by the following utterances:
● ear, nose and throat● women’s wellness checkup
This was weird. Because:● we had a robust ENT entity.
● regarding “women’s wellness checkup”, we did not have that exact utterance. Plus, “women’s” may have needed webhooks. And the Intelligent Assistant may have thought that we were looking for a female doctor.
Be that as it may, I opened 2 new stories, to look into why these 2 utterances were triggering the @unsupported_search_parameter entity.
But for the purposes of the “Can’t Yet” story, these 2 utterances were doing the job for me – they were consistently triggering the nodes I needed, to test my text edits, and wrap up this story.
This is how my testing script looked, at the end:

Story was successfully implemented, with this final version of the script.
Shown are few examples, not comprehensive testing. And provided visuals are from testing in Watson, and then in a custom-build testing tool called Commander, that I directed at my own skill, then at DEV and then at QA.



Copyright © Yelena Dobric 2025


















