The following is written by Chris Chapman, Quantitative User Experience Researcher, Chromebooks and Eric Bahna, Product Manager, Android Auto, as published in the book Applied MaxDiff: A Practitioner’s Guide to Best-Worst Scaling (published by Sawtooth Software).
At Google, MaxDiff has been used by dozens of teams and in hundreds of projects. Common uses are to prioritize users’ interest in features; to assess the frequency of use cases; to measure the appeal of content; and to rate potential messages. We have taught an internal class to over 130 researchers, product managers, designers, and engineers who now use MaxDiff to prioritize users’ needs. Here we describe a slightly more unique application: prioritizing engineering work.
Prioritizing Feature Requests
A crucial activity in a technology firm is to prioritize feature requests (FRs). An FR may arise from a customer, from an executive, or often from the engineering team itself. The product management (PM) team prioritizes FRs relative to their importance and required effort, aiming to deliver a set of features with maximal value within a budget for cost and effort.
This does not imply that one should deliver the most important FR #1 first, and then do FR #2 only if there is remaining capability. Instead, we need to maximize the total value relative to the effort (the “knapsack problem”). It may happen that feature #1 is very costly, whereas we could deliver #3-#6 with lower effort and a higher total value to users than #1 alone.
We hope you are thinking, “MaxDiff is perfect! Instead of saying all FRs are important, it forces a tradeoff. The results show the value of each FR. You can divide that value by the effort, and stack rank the result to deliver maximum value.” That is right, but there are two twists. First, we may not be able to ask customers if they are difficult to reach or features are confidential. We solve that (imperfectly) by asking PMs, Sales, and Support team members to assess the FRs on behalf of their customers. That reveals the second twist: team members may show systematic disagreement because their roles provide differing insight into users’ needs.
We address differences by highlighting the disagreement and discussing it. Traditionally this is done in a large prioritization meeting. Unfortunately, the results of such discussion may be dominated by the “HiPPO,” the highest paid person’s opinion (Kohavi & Kaushik, 2006). This is where MaxDiff is immensely valuable: we can use data instead of opinion to compare assessments by team members’ roles, such as PM vs. Sales.
Exhibit 12.1 – Team’s MaxDiff Rank
Exhibit12.1 shows a simulated example where 20 Feature Requests have been prioritized by a team meeting and are also assessed separately in MaxDiff surveys answered by PMs and Sales engineers. The diagonal shows the current ranking of importance in the engineering backlog (the results of the meeting), while the average preference of PMs from MaxDiff is plotted as square symbols, and average Sales preference as triangles. When we read across each row on the plot, we immediately see the areas of agreement and disagreement.
We see, for example, that there is modest disagreement for item #1, which somehow ended up in first place on the current priority list even though neither PMs nor Sales believe it is most important. Further down we can see areas of larger disagreement: item #3 is very important to Sales (2nd place) but low for PM (13th place). Item #12 is near the bottom for Sales (17th place) but in second place for PM, differing highly from the agreed backlog rank.
These differences are used to reassess backlog priorities. For instance, item #12 might be important to PM but not to Sales because it will attract new customers. Through such discussion, item #12 might be moved up due to its strategic importance. Also, item #2 might be moved up to #1. MaxDiff data allow us to have such conversations with less opinion, focusing on the areas where additional information and judgment are needed.
We have seen two problems in this approach. First, team members are busy and may need incentives to answer a survey. When we demonstrate that answers are used to change the product roadmap, participation goes up and respondents express enthusiasm about the process. Second, not every team member may have insight into every feature. In this case, we use constructed MaxDiff to select items that are relevant for each respondent (Bahna and Chapman, 2018).
At Google, MaxDiff has been valuable to many teams. We find the level of interest in the method to be rising steadily, both for assessment with customers and assessment among team members. We hope you find it as useful as we have.
For More Information on MaxDiff
MaxDiff is an advanced survey research technique for prioritizing and weighting the preference/importance of a list of items. Respondents see typically 3 to 5 items in each set and choose the “best” and “worst” items. The resulting scores can be made to sum to 100%
- Introductory video on MaxDiff: https://youtu.be/Uj5QE9mp3NE
- Introductory white paper on MaxDiff: https://www.sawtoothsoftware.com/download/techpap/How-Good-Is-Best-Worst-Scaling-2018.pdf
Bahna, E., and Chapman, C. (2018). Constructed, Augmented MaxDiff. In Sawtooth Software (ed.), Proceedings of the 2018 Sawtooth Software Conference. Orlando, FL, March 2018.
Kohavi, R., and Kaushik, A (2006). “RE: Hippo?” At: https://exp-platform.com/Documents/HiPPOOrigin.txt For context, see See https://exp-platform.com/hippo/