Dear Reader,

This edition of Beyond Aesthetics is part 3 of a series on UX metrics. Read Part 1 or Part 2.

Q: Do I need tons of users for my UX Metrics to be reliable?

No. You can get great UX metrics with only 5 to 10 users.

But you have to understand sample sizes and your margin of error.

Last week, I asked if you wanted a 3rd email on UX metrics.100% of you said you wanted a part 3 on UX metrics.

Yay! Much data! Such experiment!

Whatever you do, don't read the bottom-left corner of that graph.

I hid something down there: only 23 out of 4,000 newsletter readers took the survey. Scheiße.

That big beautiful 100% is only 23 responses. Labeling that graph 23/23 would have been more truthful.

No matter how pretty the design, the math beneath the graphic isn't sound. Bad math doesn't stink, meaning it won't be obvious if your math is wrong.

To get a little more confident, I used an online sample size calculator to check and I really needed 67 survey responses (at 90% confidence level with a 10% margin of error).

Of course, that's only if I want to say these survey results reflect all the subscribers, but usually you want to say that..

Well, I'm writing part 3, anyway...data be damned! 😎

My survey results weren't representative of the full population, but it feels less risky with some data.

This is a great example of how UX metrics might work. You try to gather as much data as possible, but in the end, you have to make decisions based on the (possibly crappy) available metrics.

If you have tons of data and analysis skills, maybe you could measure the behavior or attitudes of every single user. But that sort of user access is extremely rare.

Most of us will sample a slice of our user base. It's cheaper and you can use math to see if it represents your whole user base.

But you should still try to get the largest sample size you can manage because that will make your results more accurate. Sample size and confidence are correlated. The higher your sample size, the more confident you can be in your results.

How do you know that you have a large enough sample? 5 isn't always enough.

Wait—didn’t that guy from Nielsen Norman Group say that I can test with 5 users and be fine? It depends on what you're trying to do.

Let's look at the 3 P’s of UX Metrics.

Performance Metrics

During World War II, military engineers wanted to minimize pilot error and improve the performance of airplane cockpits. Usability and performance data became a life-or-death situation. These performance studies of speed and efficiency led to today’s usability and performance studies.

Performance metrics measure what the user does when doing a task. They can be both behavioral and attitudinal.

The classic usability test uses lots of performance metrics like efficiency or learnability.

Simple example: For a simple performance study, you might be troubleshooting for big issues on a new design feature. Maybe you’ve got a few wireframes, and you want to see which one has the least issues. Or maybe you’re trying to catch the big stuff before you send your design off to the developers. In that case, you might measure error, error severity, and task success rate.

Advanced example: For an advanced performance study, you might be optimizing details of a critical flow in your product, say a checkout flow. Improving small aspects of a page with high traffic can provide millions of dollars of savings. You might look at task time, drop-off rates, and efficiency.

Whatever you’re doing, performance metrics require a task. The task provides the scope for the measurement.

You can get away with smaller numbers of participants with performance studies. That’s because you’re testing how people use the design, not whether they prefer the design.

If you’re new to UX, aim for 5 user tests. If you’re experienced, aim for 8-12. Here’s the graph that everyone references for these figures:

From Measuring the User Experience an excellent overview of UX metrics

Preference Metrics

It’s tough to say that a large group of people prefer one thing over another. But the payoff can be great.

Imagine being able to tell which design will work best before you launch it. Imagine being able to compare the UX of your product to a standard?

Preference metrics can provide the answer in both behavioral and attitudinal flavors.

8 to 12 users may help you troubleshoot a design, but you’re going to need a few hundred to determine user preferences.

Take a look at two different preference tests with sample sizes of 115 and 421:

From an excellent article by Kuldeep Kelkar

See those orange lines and text on the graph?

That's the margin of error or confidence interval. Notice how it goes down with a bigger sample size. In the sample size of 115, those blue bars could vary so much that you can't really get a solid answer.

(Don't confuse confidence interval with confidence level which is set at 90% for both tests, a good setting for UX work)

UX scores like SUS and SUPR-Q are based on self-reported user preference. These are based on the attitudes of the user, and they’re often administered before and after a performance study (it’s always a good idea to balance your data collection methods).

A/B tests also generate preference metrics in an as-close-to-real-world-setting-as-possible. You can also run product experiments or concept tests with prototypes in a lab environment to study the preference of designs.

Whether it’s a survey or a product experiment, preferences can help you avoid designing something that nobody wants.

I teach an entire course on product experiments, where we learn all of these new ways designers are testing with performance and preference.

Perception Metrics

Perception metrics measure emotions through the senses, and yes, that's as weird as it sounds.

These metrics use eye-tracking, skin monitoring, facial recognition, and more with tools like UX Mining.

Eye-tracking used to cost lots of money, but now you can do it on a smartphone with something like Eye-Square.

These new technologies can tell us how an experience is regarded, and they provide a nice balance to preference and performance metrics. It’s much easier to tell if a user saw a button, if you can literally check where their eyes went!

Perception Metrics help you track hard-to-measure things like frustration, surprise, trust, and stress. The metric is often complex and unique to the software doing the measuring.

If you want to gather perception metrics, pay for a special tool or build yourself a fancy usability lab.

In the future, maybe we'll measure future behavior with a simple brain scan. Now that would be trippy...

Well, that's it for today! 🏁

You just learned Part 3 of UX metrics. 👏👏👏👏👏🏆👏👏👏👏👏

Until next week, I promise I'll stop talking about UX metrics. 🤓

Jeff Humble
Designer & Co-Founder
The Fountain Institute

P.S. We just launched a new short course on designing product experiments. Get 5 days of lessons FREE.

Oh, and this Saturday is the day! Join over 500 designers that are registered to learn UX metrics LIVE:

The Fountain Institute

Understanding UX Metrics, Part 3

Q: Do I need tons of users for my UX Metrics to be reliable?

Performance Metrics

Preference Metrics

Perception Metrics

Why 100,000 people laughed (and winced)

5 Signals That Hint at the Future of AI Interactions

When your strategy slides hit silence