
Iterative testing and evaluation in collaboration with design
TIMELINE
4 weeks
ROLE
UX Researcher
TEAM
2 Researchers
3 Designers
1 Product Manager
SKILLS
Study Design
Facilitation
Stakeholder Alignment
Collaborative Synthesis
⚠️
Some information has been omitted due to NDA. Please get in touch for a deeper look into this research study.
01 — OVERVIEW
Moving fast with our product team to uncover usability issues
IBM watsonx.ai is an enterprise-ready AI developer studio for new foundation models, generative AI, and traditional machine learning. It allows enterprises to train, validate, tune, and deploy both traditional and foundation models.
TASK
Under an ambitious timeline to deliver by the end of the quarter, the product team was going to ship a lightweight version of a new mode of Prompt Lab with or without research.
The goal was to accommodate their strict timeline and support the designers to iterate fast with user feedback on usability issues.
OUTCOME
We collaborated closely with the design team to complete a RITE study (n=6). It was conducted within the month's time frame to meet an existing implementation deadline, producing 3 design iterations. 12 recommendations and 4 opportunities were delivered.

02 — RESEARCH INTAKE
Scoping the research
Gathering secondary resources
We spent time browsing the existing research conducted by our team and across IBM Software to find:
existing relevant insights on the topic area
existing design recommendations surfaced in previous research
We hosted an informal 3 in-a-box with PM, design, and development to share what we found.
💡 Why check out existing research?
We noticed parallel research efforts for the same topic across the organization have been happening… but the product team was not aware. Sharing what we knew helped align stakeholders to a common ground, informing discovery questions.

Understanding assumptions and the timeline
Our scoping discussions with PM and design intended to shed light on their current assumptions on user expectations and goals to inform our WIP research goals and questions.
Understanding the broader timeline helped us start to narrow down the methods appropriate to be able to deliver.
💡 Why get everyone in the same room?
Ensuring the comprehensive validation of assumptions and addressing user needs/goals instead of playing catch up with competitor features.
Defining the research goals and questions
📖 The Goals
Collect feedback from users on the lightweight concept to make incremental improvements and validate assumptions by the product team.
Collect feedback from users on alternative concepts to explore new solutions for future implementation.
Uncover any needs, goals, and expectations of users when [redacted].
📖 The Questions
How do the product team's assumptions align with users' expectations and goals for [topic area]?
Is the concept chosen for the lightweight implementation working as a baseline for further iteration?
What usability issues exist to be addressed immediately for the lightweight experience?
What improvements can be made for a full scope future implementation?
What user needs are still unmet in the current iterations of the experience?
03 — THE APPROACH
Defining the research method
Using our context and constraints to choose a method
With timeline sensitivity in mind, we decided to pursue the RITE study approach.
Option 1: RITE with static high-fi images
✅ Pros:
Fast Iterations: Issues can be identified and fixed immediately
Low Cost & Effort: Static images requires no prototyping effort from Design and Dev
❌ Cons:
Limited Interaction Testing: Users can only give feedback based on visuals, not actual functionality.
Lack of Realism: Users won't behave as they would if interacting with a working product.
Option 2: Formal usability testing in a live environment
✅ Pros:
More Realistic Insights: A functional prototype reveals interaction-level friction points
Higher Confidence: Reduces risk by validating user flows including interaction before full development
❌ Cons:
Time and Resource Intensive: Study timeline will need to be pushed to accommodate the time and effort from development
Delayed Iterations: Design changes can’t be realistically implemented between sessions
Getting commitment from our designers
With a draft timeline and in mind, we presented our plan to our designers with questions we had about the current state of prototypes that may affect planning. The outcome was their buy-in and alignment over the next few weeks.
💡 Why bother our designers so much?
With their involvement key to our proposed RITE study, it was important to be clear with our expectations from them over the next few weeks.

Recruiting our participants
We targeted 6-8 participants given the resources and timeline. They were contacted through our internal Slack product feedback channel based on requirements discussed in our stakeholder scoping meetings.
💡 Why internal participants?
Leveraging internal users expedited the recruitment process as part of the effort to move quickly. Forming these connections helped build up the existing product research participant database.
04 — EXECUTION
Executing the study
Pilot testing our discussion guide
In the study set-up process we conducted a pilot test with a fellow team member. Following the pilot test our discussion guide and logistic details were refined in preparation for our first official participant.
💡 Why pilot test?
We wanted to run a pilot test with the designers in attendance to identify unclear tasks or questions and ensure the protocol was properly prepared to provide relevant feedback.

Synthesizing data and surfacing quick insights
As our sessions were being conducted we leveraged EnjoyHQ to retrieve session transcripts and Mural to synthesize.
We flagged issues based on severity and pulled out the changes that can be made as "low-hanging fruit", allowing the design team to determine what can be incorporated for the next iteration.
💡 Why synthesize in this way?
Working off the raw transcripts helped us quickly attach user quotes to issues to add context. Organizing by severity helped the designers to quickly understand issues they need to address immediately.

Arriving at our key findings
We arrived at 12 issues and 4 areas of opportunity.
Every issue was described with its frequency out of 6 participants and paired with a relevant tactical recommendation.
Our final playback to the wider time included design updates made my the design team throughout the study.

A post-playback discussion with the cross-functional team
Following the playback we got all of our cross-functional team to discuss each recommendation.
Are we committing to this for the MVP? If so, what are the next steps?
What is the level of effort to implement this from a technical perspective?
Are we deprioritizing this? Why?
💡 Why debrief?
Our sync made sure UXR recommendations were leading to prioritized and actionable outcomes with appropriate alignment across the product team. Future opportunities as a topic allowed us to initiate conversations about future related research.
05 — IMPACT
The impact of our research

🚀
1 area of opportunity identified has since become it's own dedicated workstream
🚀
6 members of our product team were exposed to RITE testing as a method for the first time
🚀
Our joint playback saw an audience of 80+
🚀
We proved the effectiveness and flexibility of research to our product team who assumed there was no time for user feedback
06 — RETROSPECTIVE
What did I learn?
Show your work
Our PM left for paternity leave abruptly during the scoping phase. Planning artifacts became very important to catch up the interim PM who lacked the same context. Alignment materials (timeline, open questions for design) help quickly onboard stakeholders for their buy in when you can “show your work” of how you arrived to where you are.
Be prepared
Stakeholders are busy people. Getting everyone in the same room can be difficult but is necessary for key decision making. What we can control as UX Researchers is coming to every meeting prepared with materials, questions, and action items to maximize time.
Communicate visually
The scoping phase of this project went as well as it did because we came prepared with materials that outlined our key questions and current POV. Especially for timelines, I noticed visualizing this content was especially well received and far digestible than a wall of text as a Slack message.