Understanding UX Metrics, Part 3


Dear Reader,

This edition of Beyond Aesthetics is part 3 of a series on UX metrics. Read Part 1 or Part 2.

Q: Do I need tons of users for my UX Metrics to be reliable?

No. You can get great UX metrics with only 5 to 10 users.

But you have to understand sample sizes and your margin of error.

Last week, I asked if you wanted a 3rd email on UX metrics.100% of you said you wanted a part 3 on UX metrics.

Yay! Much data! Such experiment!

Whatever you do, don't read the bottom-left corner of that graph.

I hid something down there: only 23 out of 4,000 newsletter readers took the survey. Scheiße.

That big beautiful 100% is only 23 responses. Labeling that graph 23/23 would have been more truthful.

No matter how pretty the design, the math beneath the graphic isn't sound. Bad math doesn't stink, meaning it won't be obvious if your math is wrong.

To get a little more confident, I used an online sample size calculator to check and I really needed 67 survey responses (at 90% confidence level with a 10% margin of error).

Of course, that's only if I want to say these survey results reflect all the subscribers, but usually you want to say that..

Well, I'm writing part 3, anyway...data be damned! 😎

My survey results weren't representative of the full population, but it feels less risky with some data.

This is a great example of how UX metrics might work. You try to gather as much data as possible, but in the end, you have to make decisions based on the (possibly crappy) available metrics.

If you have tons of data and analysis skills, maybe you could measure the behavior or attitudes of every single user. But that sort of user access is extremely rare.

Most of us will sample a slice of our user base. It's cheaper and you can use math to see if it represents your whole user base.

But you should still try to get the largest sample size you can manage because that will make your results more accurate. Sample size and confidence are correlated. The higher your sample size, the more confident you can be in your results.

How do you know that you have a large enough sample? 5 isn't always enough.

Wait—didn’t that guy from Nielsen Norman Group say that I can test with 5 users and be fine? It depends on what you're trying to do.

Let's look at the 3 P’s of UX Metrics.

Performance Metrics

During World War II, military engineers wanted to minimize pilot error and improve the performance of airplane cockpits. Usability and performance data became a life-or-death situation. These performance studies of speed and efficiency led to today’s usability and performance studies.

Performance metrics measure what the user does when doing a task. They can be both behavioral and attitudinal.

The classic usability test uses lots of performance metrics like efficiency or learnability.

Simple example: For a simple performance study, you might be troubleshooting for big issues on a new design feature. Maybe you’ve got a few wireframes, and you want to see which one has the least issues. Or maybe you’re trying to catch the big stuff before you send your design off to the developers. In that case, you might measure error, error severity, and task success rate.

Advanced example: For an advanced performance study, you might be optimizing details of a critical flow in your product, say a checkout flow. Improving small aspects of a page with high traffic can provide millions of dollars of savings. You might look at task time, drop-off rates, and efficiency.

Whatever you’re doing, performance metrics require a task. The task provides the scope for the measurement.

You can get away with smaller numbers of participants with performance studies. That’s because you’re testing how people use the design, not whether they prefer the design.

If you’re new to UX, aim for 5 user tests. If you’re experienced, aim for 8-12. Here’s the graph that everyone references for these figures:

Preference Metrics

It’s tough to say that a large group of people prefer one thing over another. But the payoff can be great.

Imagine being able to tell which design will work best before you launch it. Imagine being able to compare the UX of your product to a standard?

Preference metrics can provide the answer in both behavioral and attitudinal flavors.

8 to 12 users may help you troubleshoot a design, but you’re going to need a few hundred to determine user preferences.

Take a look at two different preference tests with sample sizes of 115 and 421:

See those orange lines and text on the graph?

That's the margin of error or confidence interval. Notice how it goes down with a bigger sample size. In the sample size of 115, those blue bars could vary so much that you can't really get a solid answer.

(Don't confuse confidence interval with confidence level which is set at 90% for both tests, a good setting for UX work)

UX scores like SUS and SUPR-Q are based on self-reported user preference. These are based on the attitudes of the user, and they’re often administered before and after a performance study (it’s always a good idea to balance your data collection methods).

A/B tests also generate preference metrics in an as-close-to-real-world-setting-as-possible. You can also run product experiments or concept tests with prototypes in a lab environment to study the preference of designs.

Whether it’s a survey or a product experiment, preferences can help you avoid designing something that nobody wants.

I teach an entire course on product experiments, where we learn all of these new ways designers are testing with performance and preference.

Perception Metrics

Perception metrics measure emotions through the senses, and yes, that's as weird as it sounds.

These metrics use eye-tracking, skin monitoring, facial recognition, and more with tools like UX Mining.

Eye-tracking used to cost lots of money, but now you can do it on a smartphone with something like Eye-Square.

These new technologies can tell us how an experience is regarded, and they provide a nice balance to preference and performance metrics. It’s much easier to tell if a user saw a button, if you can literally check where their eyes went!

Perception Metrics help you track hard-to-measure things like frustration, surprise, trust, and stress. The metric is often complex and unique to the software doing the measuring.

If you want to gather perception metrics, pay for a special tool or build yourself a fancy usability lab.

In the future, maybe we'll measure future behavior with a simple brain scan. Now that would be trippy...

Well, that's it for today! 🏁

You just learned Part 3 of UX metrics. 👏👏👏👏👏🏆👏👏👏👏👏

Until next week, I promise I'll stop talking about UX metrics. 🤓

Jeff Humble
Designer & Co-Founder
The Fountain Institute

P.S. We just launched a new short course on designing product experiments. Get 5 days of lessons FREE.

Oh, and this Saturday is the day! Join over 500 designers that are registered to learn UX metrics LIVE:

The Fountain Institute

The Fountain Institute is an independent online school that teaches advanced UX & product skills.

Read more from The Fountain Institute

When Your Strategy Slides Hit Silence By Hannah Baker Dear Reader, I’ve shared strategy before, and watched it stall. Not because it was wrong. But because the room didn’t know what to do with it. I wasn’t looking for feedback. I wasn’t asking for approval. I was hoping they’d pick it up and run with it. Instead? Confusion. Silence. They didn’t see what I saw. Not because they didn’t care. But because I’d built the strategy, not the on-ramp they needed to step into it. It’s something I’ve...

Let's Talk about Liquid Glass by Jeff Humble Dear Reader, Goodbye, paper-like design. Hello, moving blobs of liquid glass! Play button blunder from Apple Apple's new paradigm in aesthetics is both cool and potentially awful at the same time. "Rather than trying to simply re-create a material from the physical world, Liquid Glass is a new digital meta-material that dynamically bends and shapes light." -Apple Just when you thought skeuomorphism was dead, it rears its realistic head again. I...

Assumption Olympics: Why I Always Win Gold in Overreacting By Hannah Baker Dear Reader, I once pitched a new workshop format to a team of collaborators. One person nodded slowly and said, “Hmm… okay.” That was it. I smiled. Externally, totally composed. Internally? I sprinted up a mental staircase of conclusions: They don’t like it. They’re being polite. They think I’m not strategic. This was a bad idea. I’ve blown this opportunity. Maybe I’m not cut out for this work. I didn’t realize I was...