Understanding UX Metrics, Part 3


Dear Reader,

This edition of Beyond Aesthetics is part 3 of a series on UX metrics. Read Part 1 or Part 2.

Q: Do I need tons of users for my UX Metrics to be reliable?

No. You can get great UX metrics with only 5 to 10 users.

But you have to understand sample sizes and your margin of error.

Last week, I asked if you wanted a 3rd email on UX metrics.100% of you said you wanted a part 3 on UX metrics.

Yay! Much data! Such experiment!

Whatever you do, don't read the bottom-left corner of that graph.

I hid something down there: only 23 out of 4,000 newsletter readers took the survey. Scheiße.

That big beautiful 100% is only 23 responses. Labeling that graph 23/23 would have been more truthful.

No matter how pretty the design, the math beneath the graphic isn't sound. Bad math doesn't stink, meaning it won't be obvious if your math is wrong.

To get a little more confident, I used an online sample size calculator to check and I really needed 67 survey responses (at 90% confidence level with a 10% margin of error).

Of course, that's only if I want to say these survey results reflect all the subscribers, but usually you want to say that..

Well, I'm writing part 3, anyway...data be damned! 😎

My survey results weren't representative of the full population, but it feels less risky with some data.

This is a great example of how UX metrics might work. You try to gather as much data as possible, but in the end, you have to make decisions based on the (possibly crappy) available metrics.

If you have tons of data and analysis skills, maybe you could measure the behavior or attitudes of every single user. But that sort of user access is extremely rare.

Most of us will sample a slice of our user base. It's cheaper and you can use math to see if it represents your whole user base.

But you should still try to get the largest sample size you can manage because that will make your results more accurate. Sample size and confidence are correlated. The higher your sample size, the more confident you can be in your results.

How do you know that you have a large enough sample? 5 isn't always enough.

Wait—didn’t that guy from Nielsen Norman Group say that I can test with 5 users and be fine? It depends on what you're trying to do.

Let's look at the 3 P’s of UX Metrics.

Performance Metrics

During World War II, military engineers wanted to minimize pilot error and improve the performance of airplane cockpits. Usability and performance data became a life-or-death situation. These performance studies of speed and efficiency led to today’s usability and performance studies.

Performance metrics measure what the user does when doing a task. They can be both behavioral and attitudinal.

The classic usability test uses lots of performance metrics like efficiency or learnability.

Simple example: For a simple performance study, you might be troubleshooting for big issues on a new design feature. Maybe you’ve got a few wireframes, and you want to see which one has the least issues. Or maybe you’re trying to catch the big stuff before you send your design off to the developers. In that case, you might measure error, error severity, and task success rate.

Advanced example: For an advanced performance study, you might be optimizing details of a critical flow in your product, say a checkout flow. Improving small aspects of a page with high traffic can provide millions of dollars of savings. You might look at task time, drop-off rates, and efficiency.

Whatever you’re doing, performance metrics require a task. The task provides the scope for the measurement.

You can get away with smaller numbers of participants with performance studies. That’s because you’re testing how people use the design, not whether they prefer the design.

If you’re new to UX, aim for 5 user tests. If you’re experienced, aim for 8-12. Here’s the graph that everyone references for these figures:

Preference Metrics

It’s tough to say that a large group of people prefer one thing over another. But the payoff can be great.

Imagine being able to tell which design will work best before you launch it. Imagine being able to compare the UX of your product to a standard?

Preference metrics can provide the answer in both behavioral and attitudinal flavors.

8 to 12 users may help you troubleshoot a design, but you’re going to need a few hundred to determine user preferences.

Take a look at two different preference tests with sample sizes of 115 and 421:

See those orange lines and text on the graph?

That's the margin of error or confidence interval. Notice how it goes down with a bigger sample size. In the sample size of 115, those blue bars could vary so much that you can't really get a solid answer.

(Don't confuse confidence interval with confidence level which is set at 90% for both tests, a good setting for UX work)

UX scores like SUS and SUPR-Q are based on self-reported user preference. These are based on the attitudes of the user, and they’re often administered before and after a performance study (it’s always a good idea to balance your data collection methods).

A/B tests also generate preference metrics in an as-close-to-real-world-setting-as-possible. You can also run product experiments or concept tests with prototypes in a lab environment to study the preference of designs.

Whether it’s a survey or a product experiment, preferences can help you avoid designing something that nobody wants.

I teach an entire course on product experiments, where we learn all of these new ways designers are testing with performance and preference.

Perception Metrics

Perception metrics measure emotions through the senses, and yes, that's as weird as it sounds.

These metrics use eye-tracking, skin monitoring, facial recognition, and more with tools like UX Mining.

Eye-tracking used to cost lots of money, but now you can do it on a smartphone with something like Eye-Square.

These new technologies can tell us how an experience is regarded, and they provide a nice balance to preference and performance metrics. It’s much easier to tell if a user saw a button, if you can literally check where their eyes went!

Perception Metrics help you track hard-to-measure things like frustration, surprise, trust, and stress. The metric is often complex and unique to the software doing the measuring.

If you want to gather perception metrics, pay for a special tool or build yourself a fancy usability lab.

In the future, maybe we'll measure future behavior with a simple brain scan. Now that would be trippy...

Well, that's it for today! 🏁

You just learned Part 3 of UX metrics. 👏👏👏👏👏🏆👏👏👏👏👏

Until next week, I promise I'll stop talking about UX metrics. 🤓

Jeff Humble
Designer & Co-Founder
The Fountain Institute

P.S. We just launched a new short course on designing product experiments. Get 5 days of lessons FREE.

Oh, and this Saturday is the day! Join over 500 designers that are registered to learn UX metrics LIVE:

The Fountain Institute

The Fountain Institute is an independent online school that teaches advanced UX & product skills.

Read more from The Fountain Institute
A hand holding a illustration of a brain representing dyslexia

Turning Challenges into Confidence: Lessons from Dyslexia By Hannah Baker Dear Reader, When I was seven, I was an expert at pretending. I could "read" picture books without actually decoding the words, using context to fill in the gaps. It wasn’t until my mom, a teacher, noticed I was faking it that I was tested and diagnosed with dyslexia. What followed were years of frustration, advocacy, and learning how to embrace a brain that simply worked differently. While my initial reaction was...

4 risks of designing new products

28 Ways to Test an Idea (that is NOT an A/B Test) by Jeff Humble Dear Reader, Today I'm thankful for all the ways you can test that are not A/B tests. Executives and product people think A/B testing is the only thing on the testing menu. 🍽️ For me, it doesn't usually make sense to A/B test. Here's why: A/B tests should happen as late as possible. They might be the most scientific approach, but they require a lot of traffic. Plus, they're usually live and in code, so everything must be...

Big Updates and New Initiatives at the Guild of Working Designers By Hannah Baker Dear Reader, It’s been a transformative year for the Guild of Working Designers. We set out with a vision: to shape a community that’s driven by its members and creates real value for working designers. From co-creating our purpose and values with the community to building a core team, we’ve come a long way—and we’re only getting started! Here’s a quick look at everything that’s led us to this point, along with...