Training & Tuning - Yelena's UX

Training & Tuning an Intelligent Agent in a call center
At CVS, I worked on an Intelligent Assistant that helps Aetna insurance members, find providers. The IA was launched in June. I joined the company in July. The IA currently has:

● 30.9% containment rate
● 122 intents
● 10,000 utterances

I worked in post-production, on a Training & Tuning team. My workflow consisted of:

● Research: to resolved issues identified by analytics team, or by me, my research consisted of: studying Watson’s logic, analytics reports, reviewing call transcripts and audio, identifying STT issues, requesting extra analytics reports as needed.

● Implementation: once I knew what’s causing the issue, I implemented fixes in Watson, tested in Watson, then tested in a custom testing tool, then in Dev, and QA.

● Releases: coordinated overnight Production Releases on a biweekly basis, while Product Owner is testing in real-time, 15 minutes after the release, I fetched transcripts to compare with audio, write down observations, and report right back.

PROCESS OVERVIEW
Story is an issue, as defined in the Agile Development Framework. Stories are assigned to me to resolve. Some stories need Research & Implementation. Others need just implementation.

Stories come from 2 places:
From Analytics team: They identified an issue. Or they noticed a pattern and they think the pattern is causing an issue, and they want it examined.
From me: I am working on a Story that came from Analytics. And while doing my research, or implementation, I discover another, new problem. Then I open a separate Story, to resolve that new problem.

There are 2 types of Stories:
1. Script implementation only: The ones where the solution is a Script from a Voice Designer. I implement the script, create or update dialog nodes.
2. Something’s broken: This is a story where something is not working: intent, entity, dialog nodes, or conditional code. I fix it.
Once an issue is identified, a Story is created in RTC (Rational Team Concert) system, similar to Jira.

This example was a story about our Survey.

SURVEY:
Intelligent Assistant offers survey to a caller, after the “Search” (for a provider) and “Delivery” (of the results) portions of the call, were completed. Survey is when the Intelligent Assistant asks 3 questions such as “On a scale of 1-5, how happy are you with my service today?”

Situation:
In this story, Analytics team thought that the Intelligent Assistant did not understand what callers wanted, and instead offered the Survey to them.

Task:
My task was to determine if the Analytics team’s assumption is accurate, and if it is, figure out how to fix it.

Action:
My Role:
In my research, I:
● Reviewed the Analytics report.
● Reviewed full transcripts.
● Listened to all audios.
● I looked at Watson mapping of intents, entities and dialog nodes, relating to this story. I identified items that were present in the calls, but missing in Watson.
● I looked for patterns in everything I reviewed.

Sample size: 27 calls.

Result:
● Analytics team assumption was not confirmed. The Intelligent Assistant did not offer Survey, when and because it did not understand what user said and wanted. Survey was offered after search results have been delivered. However, I discovered a set of other, minor issues, that would improve the odds of Survey being successfully completed. I created separate stories for those issues, and resolved those.
● 30% reduced onboarding time for new hires. After doing enough many of these types of fixes implementations, and training newly hired AI Modelers on the process, I came up with this document that:
- Captured the process, step-by-step.
- Captured tips, tricks, best practices, lessons learned, and common pitfalls.
- Captured all links that one would for implementation.
Here is that document It has 16 pages.

Here’s my journey. It starts with an RTC (Rational Team Concert) story. RTC is like Jira.

The Analytics report offered some data, but I went deeper, and reviewed complete transcripts and audios.
Sample call (bolded is the turn we are focusing on, towards the end of the call)

Intelligent Assistant: Hi, I’m your Aetna Virtual Assistant. I’m here to help you find a doctor, hospital, or specialist for Michael. You can ask me questions just like any other representative. What kind of healthcare provider can I help you find today?

User: Primary

Intelligent Assistant: Would you like me to search near the zip code on file?

User: Correct

Intelligent Assistant: I can help you with that. Looking for Primary Care Physicians near the zip code on file. Okay, I’ve found 50 Primary Care Physicians for you near the zip code on file. I’ll read the results in groups of three. If you’d like to hear more information about one of these providers, just say first provider, second provider, or third provider, or I can read you the next set of healthcare providers. Let me know when you’re ready.

User: Ready

Intelligent Assistant: The first provider is Suzanne Giunta, MD. In Richmond. About 2 and a half miles awayThe next provider is Scott Thomas Armistead, MD. In Richmond. About 2 and a half miles away. And then Christine, Paul, MD. In Richmond. About 2 and a half miles away. Which one did you want more details about? From this group of three, you can say first provider, second provider, or third provider. Or would you like to hear the next set of providers?

User: The next set please

Intelligent Assistant: Certainly. The first provider is Jacqueline Britz, MD. In Richmond. About 2 and a half miles away. The next provider is Denee Moore, MD. In Richmond. About 2 and a half miles away. And then Kathleen Diana, Dipasquale-Seelig, MD. In Richmond. About 3 miles away. Which one did you want more details about? From this group of three, you can say first provider, second provider, or third provider. Or would you like to hear the next set of providers?

User: That’s correct the second one

Intelligent Assistant: I can spell it for you. Denee J., Moore, MD (Intelligent Assistant is spelling…) Which one did you want more details about? From this group of three, you can say first provider, second provider, or third provider. Or would you like to hear the next set of providers?

User: Second provider

Intelligent Assistant: Of course. Denee J., Moore, MD is a female physician. She is located 2.61 miles from the zip code on file,
at 4730 North Southside Plaza, Richmond, STATE_VA. Her phone number is 804) 230-7777. She has listed 1 language spoken. It is English.
Would you like to hear the next result, or the next set of providers?

User: No

Intelligent Assistant: Would you like to find another healthcare provider?

User: No

Intelligent Assistant: Okay. I hope I’ve been of assistance to you. Would you be willing to answer a few short questions about your call with me today?

User: Second provider

Intelligent Assistant: I didn’t catch that. I hope I’ve been of assistance to you. Would you be willing to answer a few short questions about your call with me today?

User: Though

Intelligent Assistant: Thank you for your feedback! Thank you for calling Aetna. You can hang up at any time or ask me another question.

In this call, the caller first said they are finished with their search, and then changed their mind, and wanted to continue the search. However, the Intelligent Assistant already exited the “Search” portion, and “Call Wrap Up”, and entered “Survey”. In this situation, the Intelligent Assistant cannot provide information that took place in the “Search” section, anymore.

If the caller wants, they can start new search. Survey response parent node allows digressions out.
This was a pattern: when Survey was offered, caller wanted to do something else. Caller would say they finished their search. But when Intelligent Assistant offered the Survey:

● Callers would change their mind, and wanted to continue their search, or start a new. This happened in calls, where the caller seemed distracted. That was my impression from listening to audios. They may have been multi-tasking, or otherwise, not listening carefully. This is, however, the beauty of voice.

● They wanted to ask how long, the email with the list of doctors, would take to arrive.

● They asked about their co-pay.

● They wanted Intelligent Assistant to wait for a minute.

This was my process:

I took main values from the Analytics report:

● Ground Truth
● Call Description
● Intent that Watson identified
● Entity that Watson identified

Then I reviewed transcripts and listened to audios, for each call. So here’s a thing about audios: Analytics team does have human reviewers, but the audio they listen to, has gone through an audio splitter already. And Analytics folks skip turns, to only hear the turn they are interested in. This is fine for certain situations and certain goals. But in some cases, it leaves out the context.

As a UX practitioner, this context is crucial to me. I listen to the whole audio.

Below are sample pages from the document I produced, to present my findings to my team, so the Product Manager can decide next steps.

This is how I came up with the final column in this report I created, “Yelena’s research, analysis and recommendation”:
Here are some numbers:

2 calls = “eye” transcribed as “I”, in an “eye doctor” request. Intelligent Intelligent Assistant did not understand. This was a known bug.

2 = STT (speech to text) issues. Utterances were “yes”, “I guess” and “I can”.

2 = Intelligent Assistant did not understand utterances that exist in our corpus, where callers wanted a CSR (Customer Service Representative), “member services” and “operator”, or “correct” that exists in the #yes_response intent.

2 = chitchat intents identified – we have 28 chitchat intents, out of 122 total intents. We should get rid of them.

Bottom line: for 26 out of 27 callers, their initial intent was satisfied.

However, here are the stories that I created, as results of the Survey story research:
I successfully implemented 3 out of 5 Stories.

#3: when I started working on #delivery_holdon intent, it turned out the whole intent was not working. So that became a Story on its own, for me to investigate.

#5: Timing of email, is still with a Voice designer.

Success of the survey was measured again, after 3 months, and the take up improved 27%, based on these edits. I did not do any sweeping changes, but merely removed common roadblocks.

Yelena Dobric

Training & Tuning an Intelligent Agent in a call center