Measuring Task Time During Usability Testing

I design applications that are used all day, every day in a corporate setting. Because of this, I measure efficiency and time when I do usability studies to make sure that we are considering productivity as part of our design process.

Although actual times gathered from real interactions via an analytics package are more reliable and quantifiable than those gathered in usability testing, they require you to have a lot of users or a live product. When you're in the design stage, you often don't have the ability to gather that kind of data, especially when you're using mockups or prototypes instead of a live application. Being able to gauge the relative times of actions within a process during usability testing can be helpful, and being able to compare the times of two new design options is also valuable. Gathering information about task times early in the design phase can save money and effort down the road.

HOW TO CONDUCT A TIME STUDY

During a typical usability study, simply collect the times it took to accomplish a task. The best way to do this is to measure time per screen or activity in addition to the duration of the task, so that you'll be able to isolate which step of a process is taking the most time or adding unnecessary seconds. This can be more illuminating from a usability perspective than simply knowing how long something takes.

Make a video screen recording of the session. Pick a trigger event to start and pause timing, such as clicking a link or a button. Gather the times via the timestamp when you replay the video. Don't try to time with a stopwatch during the actual usability test. You can make a screen recording with SnagIt, Camtasia, or Morae, or through any number of other tools.

When comparing two designs for time, test both designs in the same study and use the same participants. This means you'll have a within-subjects study, which produces results with less variation - a good thing if you have a small sample size. To reduce bias, rotate the order of designs so each option is presented first half of the time.

COMMON QUESTIONS ABOUT TIME STUDIES

Should you count unsuccessful tasks?

Yes and no. If the user fails to complete the task, or the moderator intervenes, exclude it from the time study. If the user heads the wrong direction, but eventually completes the task, include it.

What if my participant is thinking aloud and goes on a tangent, but otherwise, they completed the task?

I leave "thinking aloud" in and let it average in the results. If the participant stops what they are doing to talk for an extended period of time (usually to ask a question or give an example), I exclude the seconds of discussion. But, be conservative with the amount of time excluded and make sure you've made a note of how long the excluded time was.

Should you tell participants they are being timed?

I don't. Sometimes I'll say that we're gathering information for benchmarking, but I generally only give them the usual disclaimer about participating in a usability test and being recorded.

How relevant are these results?

People will ask if times gathered in an unnatural environment like usability testing or a simulation are meaningful. These times are valuable because some information is better than no information. However, it's important to caveat your results with the methodology and the environment in which the information was collected.

REPORTING RESULTS: AVERAGE TASK TIMES WITH CONFIDENCE INTERVALS

Report the confidence interval if you want to guesstimate how long an activity will take: "On average, this took users 33 seconds. With 95% confidence, this will take users between 20 and 46 seconds."

Report the mean if you want to make an observation that one segment of the task took longer than the other during the study. A confidence interval may not be important if your usability results are presented informally to the team, or you're not trying to make a prediction. Consider the following scenario: you notice, based on your timings, that a confirmation page is adding an average of 9 seconds to the task, which end-to-end takes an average of 42 seconds. Does it matter that the confirmation screen may actually take 4-15 seconds? Not really. The value in the observation is whether you think the confirmation page is worth nearly 1/4 of the time spent on the task, and whether there's a better design solution that would increase speed.

When you're determining average task time, always take the geometric mean of times instead of the arithmetic mean/average (Excel: =GEOMEAN). This is because times are actually ratios (0:34of 0:60). If the sample size is smaller than 25, report the geometric mean. If the sample size is larger than 25, the median may be a better gauge (Excel: =MEDIAN).

If you're reporting the confidence interval, take the natural log of the values and calculate the confidence interval based on that. This is because time data is almost always positively skewed (not a normal distribution). Pasting your time values into this calculator from Measuring U is much easier than calculating in Excel.

REPORTING RESULTS: CALCULATING THE DIFFERENCE BETWEEN TWO DESIGNS

For a within-subjects study, you'll compare the mean from Design A to the mean of Design B. You'll use matched pairs, so if a participant completed the task for Design A, but did not complete the task for Design B, you will exclude her both of her times from the results.

There are some issues with this, though. First, I've found it very difficult to actually get a decent p-value, so my comparison is rarely statistically significant. I suspect this is because my sample size is quite small (<15). I also have trouble with the confidence interval. Often my timings are very short, so I will have a situation where my confidence interval takes me into negative time values, which, though seemingly magical, calls my results into question.

Here's the process:

Find the difference between Design A and B for each pair. (A-B=difference)
Take the average of the differences (Excel: =AVERAGE).
Calculate the standard deviation of the differences (Excel: =STDEV).
Calculate the test statistic.
t = average of the difference / (standard deviation / square root of the sample size)
Look up the p-value to test for statistical significance.
Excel: =TDIST(test statistic, sample size-1, 2). If the result is greater than 0.01, you have statistical significance.
Calculate the confidence interval:
1. Confidence interval = Absolute value of the mean of the difference +/- (critical value (standard deviation / square root of sample size).
2. Excel critical value at 95% confidence: =TINV(0.05, sample size - 1)

REFERENCES

Both of these books are great resources. The Tullis/Albert book provides a good overview and is a little better at explaining how to use Excel. The Sauro/Lewis book gives many examples and step-by-step solutions, which I found more user-friendly.

Interested in more posts about usability testing? Click here.

usability testing, user experience designJulie YoungJune 7, 2016