Djakhangir Zakhidov

UX Researcher and Product Manager

UX Case Study — Improving Patient-Interviewing
Skills of Medical Students with a Conversational
Virtual Patient in Augmented Reality

The University of Texas at Dallas (UTD)

The University of Texas Southwestern Medical School (UTSW)

Arizona State University (ASU)

This research is sponsored by the National Science Foundation (NSF)

This research study was published in the International Journal of Human-Computer Interaction in 2024.

The Challenge

Medical students score low on patient-interviewing on the MCATs
Medical students have low confidence in medical interviewing
Not enough practice opportunities exist in the traditional medical interviewing training curriculum

Understanding the users & context — first hand research and data-collection to avoid assumptions

To better understand the context and specifics of the challenges our users experience I interviewed subject matter experts (SMEs) in medical education, more specifically — the dean of the undergraduate medical education and several faculty members at UTSW. I also conducted a focus group with four medical students to better understand our users’ needs.

Who are our users?

1st and 2nd year students in medical Pre-Clerkship
They have little to no experience in patient interviewing
They are highly competitive and deeply motivated to succeed
They are willing to spend extra time and resources to achieve the best results

Understanding the context

Medical school curriculum is intense and high-paced
The current patient interview practice method relies on hiring standardized patients (SPs) — or human actors trained to portray sick patients. This presents a variety of limitations, such as challenges with scheduling practice sessions, the need to practice in groups, and very seldom opportunities for one-on-one practice.

Understanding user needs

Students need a low-stakes environment for practice
Students need individual AND social learning opportunities
Students need to be able to practice as much as they want, wherever and whenever

The Solution

As a team, we conceptualized the solution for the problem — a conversational and emotive virtual patient (EVP). This solution was based on our lab's previous research in development of conversational virtual humans. By now, we already had several earlier prototypes that we tested with users.

For example, prior to this study, our team conducted a pilot study to determine which interaction modality best suits the proposed solution — a web-based experience, a virtual reality (VR)-based experience, or an augmented reality (AR)-based learning experience. Our pilot study’s findings suggested that we should develop the learning experience in AR.

When we presented our initial design of the EVP experience to the SMEs at UTSW Medical School, key requirements were delineated by the SMEs:

The educational experience needs to mimic, as closely as possible, the interaction with a standardized patient (SP)
Students need to be able to speak naturally with the virtual patient
Students need to practice interviewing in an actual patient room
Our system needs to be able to assess students’ interviewing skills
System content needs to be nased on official medical education curriculum, such that it in the future, it may be integrated into the curriculum at UTSW and other medical schools.

Understanding the Medical Interview

To better understand what a medical interview experience should be like I observed multiple live OSCE practice sessions at UTSW. In Objective Structured Clinical Examinations (OSCE), a medical student interviews an SP who then assess the student's abilities as an interviewer.

I developed dialogue maps and conversation flowcharts to represent the non-linear structure and flow of a medical interview.

A Framework for Exploring Social Learning in Augmented Reality

Social learning, simply put, means learning from others, and AR presents surprisingly potent and effective affordances for social learning. In a grant proposal to the National Science Foundation I helped to develop both the technological research agenda for developing a conversational virtual patient and also developed a framework for exploring social learning theories of Bandura, Piaget, and Vigotsky in an AR-enabled educational setting. To be able to properly explore the social learning paradigms we found in the literature, we needed more virtual characters and so we proposed the development of pedagogical agents as learning companions, or PALs, such as virtual peers and virtual experts.

Our final framework, titled MARFA - A Metaverse AR Framework for Education and Training with Virtual and Real Humans - enabled us to try different configurations of real humans and virtual PALs to aide students in learning patient interviewing. As seen in the figure below, there are four levels of learning in MARFA - Observation, Interaction, Assessment, and Feedback - and each level provides not only opportunities for social interactions with virtual PALs or real humans, but also for doing it in one of the three interaction modalities: Individual AR, Co-Located AR, and Distributed AR.

Co-Located AR

Unlike the isolated and singular experience of using a phone or a website, AR enables co-located shared experiences for users. My team was presented with a unique challenge to not only design such a learning experience but also develop it. It took us about four iterations to finally end up with a stable co-located AR experience enabled through a spatial world anchor shared between two or more AR devices over a hub-and-spoke network architecture. We had it! Two students can see, hear, and interact with Walter at the same time or taking turns!

Research Questions

At its core, our research is focused on identifying best opportunities for human-machine teaming. When is it better to learn socially with a virtual human, and when might a real human be better?

Our key Technological and Learning Science-related research questions were:

What configurations for social learning with PALs and humans in a collaborative AR environment produce significant change in self-efficacy and learning?
What configurations for social learning with PALs and humans in a collaborative AR environment do medical school students prefer?
What implications does the research suggest for the potential of PALs and humans in collaborative AR to advance human-machine teaming for learning?

Our key UX Research questions were:

How do the users feel about the educational content and the overall design of our system?
Can the users navigate the system on their own?
How usable is our system?
What features do the users like and dislike?
How can the system be improved?

UX Design

Our UX Design was influenced by the above research questions. As such, we designed and developed a system in which the students could:

Interact with a novel AR user interface using voice and gestures.
Learn through observing a medical interview demonstration by a PAL or a human collaborator.
Learn through interviewing the VP by speaking naturally and asking questions just like they do with a standardized patient (SP).
Learn through receiving feedback from a PAL or a human colleague.
Learn on their own or together with a co-located human colleague.
Be able to be observed by and receive feedback from a remotely located human expert.

The figure below presents the various configurations for social learning with different PALs:

UI Design

AR presented us with many opportunities to design novel types of user interfaces. For example, I designed a spatial selection menu for using hand gestures to select between different types of virtual collaborators. The user can use his/her hand to hover over each hologram, and then press the "Confirm" button. It took us some time to get the feeling of pressing the button just right, as we animated different state changes for the button based on whether it was fully pressed or not. However, due to the novelty of our UI, we didn't think that users would understand intuitively how to make selections, so we made instructional videos that demonstrate how to use hand gestures in the Microsoft HoloLens 2.

Prototype Development

I wrote another Case Study on the iterative development and testing of our EVP prototype. The video below will give you an idea of how far we got with our prototype development.

UX Research

With guidance from our learning science colleague from ASU, I designed and conducted three human-subjects experiments to evaluate the user experience and the effect of our technology on learners.

Participants

Over three academic semesters, we recruited a total of 75 UTSW medical students using convenience sampling of volunteers to participate in three experiments. Our inclusion criteria included: participants had to be at least 18 years of age and enrolled in the medical school’s pre-clerkship track, which includes the first year and the first semester of second year. The age demographics for medical students are relatively uniform with most students matriculating within a year or two after graduation from college although there is a minority of students who enter medical school after military service or other careers. About 27% of students are underrepresented minorities.

Data Collection

Quantitative data on three types of dependent measures were collected through surveys: self-efficacy, perceptions of the virtual PALs, and learning.

Learners rated their self-efficacy — or individual confidence of applying the knowledge gained, and confidence in the credibility of knowledge gained — using the validated General Self-Efficacy Scale (GSE).
Participants’ perceptions of virtual PALs were measured using the agent persona instrument-revised (API-R), also a validated test.
The knowledge measure consisted of conceptual and procedural learning. One knowledge question was asked of students at pretest and posttest: “Please explain the process of patient interviewing. Focus on the steps from beginning to end that should be implemented.” The knowledge question response was coded using separate measures of conceptual and procedural knowledge, as provided by the SMEs.

Qualitative data included responses to open-ended questions which provided nuanced insights on: students’ perceptions of the VP and PALs; how students best learn socially; who students prefer to observe and receive feedback from; and overall commentary on social learning in AR. I also conducted an exit interview/debrief session to garner more unique insights from the students. The debrief session was designed to let students share thoughts about the learning experience in a less structured format.

Finally, I took detailed notes and performed my own evaluation of the technology and users’ learning experience through ethnographic observations during all of the experiment sessions.

Experiment Procedures

Three experiments were conducted in examining-room-like settings to evaluate the various system configurations and the resulting user experience. All three experiments followed the general progression of — observation, interaction with the VP, assessment, and feedback — albeit with different configurations of virtual PALs, human colleagues, and learning modalities. In all three experiments, students were first asked to read and sign the consent form. Students then completed the pre-experiment survey which included demographics-related questions, the knowledge question, the GSE instrument. Students then proceeded to watch a 90-sec instructional video to learn how to use the HoloLens 2 and interact with the EVP. Next, students were asked to put on the HoloLens2 headset with a pre-launched EVP application. Upon the completion of the EVP application, students filled out a post-experiment survey which had the knowledge question, the GSE instrument, the API-R instrument and open-ended questions.

Experiment 1 (n=35) focused on who students preferred to observe and receive educational feedback from: a virtual peer (male or female), or a virtual professor. Experiment 1 took approximately 45 min to complete. Using within subjects randomized block design, students were assigned into three groups of 12 students each: Group A—virtual female peer colleague; Group B—virtual male peer colleague; and Group C—virtual female professor colleague.

In the Observation Phase, students received a 10-min demonstration performed by the assigned PAL colleague, demonstrating how to effectively communicate with the EVP. During the demonstration, students observed the virtual colleagues demonstrating the skills needed to obtain the desired information from the EVP. The demonstration also included non-verbal communication aspects, emphasizing the importance of sustaining eye contact, employing suitable gestures and body language, and actively engaging in attentive listening during the interaction.

In the Interaction Phase students conducted an interview with Walter to apply what they had learned during the demonstration. Students could ask questions in any way and order that they preferred. Students were told to take as long as needed to conduct the interview. Data collected during this phase is used in the assessment phase.

The Assessment Phase included the collection of conversational data through the NLP and analysis based on specific items and scoring criteria in the OSCE rubric.

In the Feedback Phase assessment results were delivered to students through feedback coming from one of the three virtual PALs, based on which PAL was assigned to the student. The virtual PAL commented on each of the eight rubric items, eg, “Good job on using transition statements,” or “Don’t forget to introduce yourself.”

Experiment 2 (n=30) focused on how students learn best socially: (1) through interviewing a VP and receiving feedback from a peer; or (2) through observing a peer interview a VP and providing educational feedback to a peer. Experiment 2 took approximately 45 min to complete. This experiment was conducted in a co-located AR environment, where both students were in the same examining room with the VP. Using within-subjects randomized block design, students were assigned into 15 pairs, each consisting of two groups: Group A—participant; Group B—observer. Therefore, we had two conditions: interactive learning while receiving feedback and observational learning with providing feedback. The experiment involved human PALs who were peers with comparable novice abilities, ie, they were both first-year or first-semester second-year students. The social learning motivation behind Experiment 2 was focused on the observation of a human PAL peer, followed by the exchange of feedback from a human PAL peer.

In the Observation Phase a 10-min demonstration was provided by the human PAL peer, who acted as a social role model for the other student to observe and imitate. The peer demonstrated, to the extent of their knowledge, how to conduct a patient interview, eg, how to ask questions, phrase them appropriately, and engage in effective communication.

In the Interaction Phase both students took turns interacting with the EVP and observing their human peer partner.

The Assessment process exclusively involved the student and the human PAL’s observations. Consequently, the feedback provided was not bound by any specific guidelines or rubrics and was solely at the discretion of the human PAL.

In the Feedback Phase the observing student in the human peer pair provided verbal assessment to the student who performed the interview. Then the students reversed roles. The assessment did not follow a rubric or guideline and was left to the discretion of the human peer.

Experiment 3 (n=10) focused on how students learn socially when paired with a virtual professor and a remote expert observer. Experiment 3 provided students with a configuration for social learning in AR through: (1) observation of the virtual professor, (2) medical interview practice with Walter, and 3) receiving feedback from a remote human expert based on the rubric. Using within-subjects design, all the participants were placed into one group.

Observation Phase: In Experiment 2, we observed that the student peers struggled with demonstrating, assessing, and giving feedback—we presume because both participants were novices. Therefore, in Experiment 3, we iterated using design-based research and added the virtual professor back into the observation level as we did in Experiment 1.

Interaction Phase: Following the demonstration by the virtual professor, students conducted an interview with the EVP to apply what they had learned during the demonstration.

Assessment/Feedback Phase: Subsequently, students received educational feedback from the remote human expert, who observed the interviews. The remote human expert, an advanced medical school student, delivered feedback based on the assessment of the students’ performance. As with the machine learning (ML) feedback in Experiment 1, the student presentation of criteria was ranked yes, no, or somewhat, using the rubric; but now interpreted by the remote human expert. The remote human expert could also add personalized custom comments to the feedback form as they deemed appropriate.

Results

The detailed data analysis and comprehensive results of the three experiments are presented in the paper we recently published.

The key results of quantitative data analysis are summarized below:

PAL type (peer or expert) had a significant main effect on self-efficacy and procedural knowledge. Students paired with a virtual expert PAL for observation and feedback had higher gains in self-efficacy compared to students who were paired with a virtual peer PAL. An independent means t-test revealed a significant difference between virtual peers (M = 3.87, SD = .52) and virtual professor (M = 4.28, SD = .50), t(33) = 1.78, p = .04; Cohen’s d = .62. A one-tailed repeated measures t-test revealed a significant increase between the procedural knowledge pretest (M = 3.67, SD = 1.47) to posttest (M = 4.14, SD = 1.19) scores, t(35) = 2.26, p = .03, Cohens d= 0.38.
Female students has significantly lower self-efficacy both pre and post-intervention, compared to male students. This finding is mirrored by many other studies on females’ confidence in STEM-related fields. A series of independent t-tests (two-tailed) were conducted on participants’ gender to determine differences between the participants’ gender and the dependent variables. These analyses revealed significant differences between gender and self-efficacy pretest t(33) = 2.16, p = .04; Cohens d = 0.73. These analyses also revealed significant differences between gender and self-efficacy posttest t(33) = 2.59, p = .01; Cohens d = 0.88. Women (M = 3.63, SD = 0.53) have lower self-efficacy in the task at the start of the study than men (M = 3.98, SD = .41). Women (M = 3.78, SD = 0.41) still had lower self-efficacy scores after the study than men (M = 4.21, SD = 0.56).
Students experienced significant overall gains in self-efficacy in configurations where they got to observe a demonstration by an expert and received feedback from an expert. A one-tailed repeated-measures t-test revealed a significant difference between self-efficacy pretest (M = 3.78, SD = 0.52) and self-efficacy posttest (M = 3.99, SD = 0.54) scores, t(34) = 2.05, p = .02; Cohen’s d =.35.
Students did not have significant overall gains in self-efficacy in the configuration where two equally-abled peers were paired and provided demonstrations and feedback to each other. A one-tailed repeated measures t-test was performed on the participant’s self-efficacy pretest and posttest gain. The self-efficacy pretest and posttest scores were calculated by taking the means. This analysis revealed a nonsignificant difference between the self-efficacy pretest and posttest t(27) = .13, p = .45 Cohen’s d = 0.024.
Second-year medical students rated the credibility of the virtual patient and PALs higher than first-year students. A series of independent t-tests (two-tailed) were conducted on participants’ academic year to determine differences between the participants’ academic year and the dependent variables. These analyses revealed a statistically significant difference between academic year and credibility ratings t(35) = 2.56, p =.02; Cohens d = 0.88. Second year students (M = 4.38, SD = 0.59) gave significantly higher credibility ratings of the assigned virtual colleague than first year students (M = 3.73, SD = 0.81).
While the virtual expert PAL was rated as less engaging than the virtual peer PAL, students who were paired with a virtual expert PAL had significantly higher gains in self-efficacy compared to students who were paired with the virtual peer PAL.

User Experience

Qualitative Analysis of open-ended responses from the students revealed the following insights about user experiences:

Many students considered virtual PALs a valuable learning resource, valuing their capacity to showcase effective communication strategies.
Observation was the most frequently favorably mentioned phase of the system.
Even with the limitations of the imperfect early-prototype system, 22% of students expressed positive feedback regarding interviewing the VP, finding it to be realistic and accurately simulating an SP encounter.
Students enjoyed the unique nature of the AR learning experience, noting the VP felt very real in terms of physical presence and appeared natural and life-like.

“I really liked how it actually felt like the patient was right in front of me. It felt very real in terms of physical presence, and I felt motivated to communicate with the patient like I would a normal patient in real life. Also, I liked how the responses the patient gave were typical responses that an actual patient would give. It was nice to see the collaborator go through a step-wise HPI, and I was able to follow where in the process she was at every moment.”

Students stated that facial expressions and movements of the VP allowed them to fully engage in the interaction.
A major portion of students, roughly 80%, mentioned challenges encountered during interviewing the VP.
Many students identified issues related to natural language processing, such as unexpected interruptions, difficulties in processing transition statements, inadequate responses from the VP, and speech recognition limitations.

“The software was quite unreliable and the virtual patient often did not understand what I was asking it and would repeat itself multiple times. This led to the interviewing being impossible to conduct in a structured fashion as the patient was poorly responsive to my inputs.”

Some students, roughly 10%, stated that they would like clearer instructions on how to conduct a patient interview.
Roughly half of the students, or 55%, highly valued the feedback component, recognizing its significant benefits for learning and self-reflection. Students enjoyed that the feedback was immediate.
While the feedback module was generally well received, roughly half of the students complained that the feedback was too general. Several students expressed the desire for more personal feedback.

“The feedback system was actually really good, but I think it can be more expanded to give more feedback on specific tasks that students should complete in an interview, such as family history, social history, etc.”

About a quarter of all students emphasized the value of a low stakes environment to reduce anxiety and provide a safe space for practice.
Roughly 20% of the students spoke of positive implications of our system on the future of medical practice and interview preparation, highlighting increased practice opportunities, wider accessibility, convenience, potential economic benefits, and flexibility for practice.

The analysis of the semi-structured debrief sessions further shed light on the user experience:

All participants made positive remarks regarding the overall learning experience.
Positive themes included accessibility, reduced anxiety, and an enhanced learning experience.
Most students stated that the interaction with the VP created a less stressful environment.

“At first, I was a little bit nervous about making mistakes or not understanding it, but when I actually went through the experience I found it very enjoyable and super helpful to my medical academic career.”

All participants reported finding the act of observing the collaborator to be helpful.
When asked about their preferences regarding who they would like to observe, 70% indicated having no specific preference, 20% found the virtual collaborator to be acceptable, and 40% simply expressed a preference for a credible or experienced collaborator.
Students also experienced a number of apparent technical issues, such as glitches, scripted-like phrases, and interruptions from the VP.

“…it was pretty glitchy and didn’t really pick up on what I was saying. I think the input was not getting through appropriately and it felt like I was going in some circles with the virtual patient.”

When comparing the rapport-building skills, students noted that SPs tend to have greater emotional intelligence in developing connections or expressing symptoms, while EVPs might be limited in their adaptability.
All participants stated that they would use this system for supplemental practice to enhance their interviewing skills.

“I think that I’d probably use it before going on standardized patient interaction visits. I’d probably also like to see this used in colleges, which is our small mentor groups. That would be helpful as well. Yeah, I’d probably use something like this on a weekly basis. Yeah, just to practice over and over to improve each time.”

So, What Does This Mean?

The quantitative and qualitative results provided evidence that our product was valuable for learners:

1) Our EVP product increased the confidence and skills of our learners when specific AR Social Learning configurations were selected.

We saw that when students had an expert to observe and receive feedback from, their self-efficacy, or confidence, and procedural knowledge, or the knowledge of how an interview unfolds, increased significantly. This was contrasted by situations where there was no expert, whether virtual or human, to observe and therefore no expert on whom students could model their behaviors. In this case, we saw a decrease in self-efficacy and knowledge.

2) Certain AR Social Learning configurations work better than others, and there is yet much work to be done in this area.

While we saw the benefits of pairing up learners with experts for observation and feedback, there may be situations when pairing up with a peer who is your friend may be beneficial, as there is much literature in support of this as well. We simply didn't have enough time to explore more nuances of social learning in AR.

3) Most learners saw value in our product, even with the existing limitations.

It was uplifting to receive praise and encouragement from 75 medical students who saw the potential of this product even with all of the limitations present. Students valued our product for being able to provide repetitive interviewing practice, to include in the future diverse patients with a variety of medical conditions. Students highly valued the ability to observe a life-like virtual expert demonstrating how to do the patient interview and provide feedback on students were doing.

4) This product has a potential to address a long-standing inequality in self-confidence between males and females.

Our finding finding that female students had lower self-efficacy or confidence is commonly observed in various STEM-related professions. It is no secret that females often feel as less-confident than men in professions often dominated by men. Women often feel the impostor syndrome which may negatively influence their self-esteem and work output. This product has a potential to address these long-standing gender inequities in self-efficacy by providing more unique and tailored practice opportunities for female studentsin order to raise their confidence enough to match male confidence levels.

5) Most learners are excited about this product and describe ways in which they would use something like this.

Students described our product as a low-stress and reduced anxiety environment where it was OK to make mistakes because the virtual patient wouldn't judge them as much as an SP would. Students complimented on the realism and immersiveness of the learning experience, often talking about Walter in a way as if he were real. Many students valued the ability to quickly practice using our product when needed - for example, before doing an actual patient interview.

Actionable Insights from Ethnographic Observations

Throughout the study, I observed the resilience and adaptability of the students as they navigated through the interactions with the VP, PALs, and each other, displaying their ability to adapt and learn in a first of its’ kind simulated AR environment. I soon realized that the students may be able to use the VP learning module in unanticipated ways for personalized learning.

For example, I observed one student in our study who spoke with an accent in that English was not their native language. This student really struggled at first, as the VP could not understand what the student said and provide any relevant response. This prompted the student to slow down and try to better articulate what he was saying, which resulted in improved system performance. The more this student carefully articulated, the better the VP responded. At the end of the session, the student expressed the value of the system for private practice, particularly helpful for students who need to practice patient interviewing in a non-native language.

I observed a pair of students who knew each other and were friends have a more successful co-located learning experience compared to students who did not know each other. One of these students was so enthused they asked to repeat the experiment, and her colleague’s self-efficacy score increased significantly from pre to post-survey. This observation may point to the value of established prior relationships in social learning experiences. Perhaps we should encourage friends to study together in the collaborative AR environment, as this would render a more enjoyable and effective learning experience.

I also observed that the students who were better at interviewing tended to get better responses from the VP, and thus enjoyed the experience more than the students who did not have good interviewing skills. This may in part explain why the second year students rated the credibility of the VP interaction significantly higher than the first-year students. This presents opportunities for developing custom and personalized systems which enable the incremental building of expertise, with more challenging content developed for the more advanced students. For example, introductory mini modules could be developed for truly novice students to reduce cognitive load through simplified interactions.

Conclusion

This work provides initial evidence and inspiration for the design and implementation of futuristic learning systems utilizing conversational virtual humans in augmented reality to enable new ways for users to learn socially in formal and informal settings.

To summarize,

Users liked the educational content and the overall design of our system.
Users were able to navigate the system on their own with minimal instruction.
Users found the system a viable low stress / reduced anxiety learning tool which helps to improve patient interviewing skills.
Users valued the observation and feedback features the most, followed by VP interaction.
Users helped us understand opportunities for further customization and personalization to target specific user sub needs, such as:
- Better articulation for English non-natives
- Increasing confidence in females
- Opportunities for social modeling of virtual and human PALs

The above research helps me to to conclude with optimism, that as AR-devices and networks improve and become more ubiquitous; as more robust AR headsets become available, cost effective and ergonomic; and as recent advances in generative AI improve NLP, assessment, feedback, and interactive capabilities of virtual agents — students would utilize new educational products similar to our system, which optimize new human-machine teaming paradigms to enable social learning in a relatively stress-free, highly customizable, and flexible environment.