Usability testing Spotify's group session feature.

iphones displaying the supa app screens

The Brief

Plan and execute a usability evaluation of a product or service.
Spotify — University
10 Weeks
Design Budget
Student project
Just me
My role
User research

The Problem

I chose to evaluate the group session feature because I had a disappointing experience of using the feature. My disappointment stemmed from three key areas:

  • Awareness and discovery I found the feature by mistake and was not aware of its launch despite using Spotify every day
  • Ease of use I was not clear on how to use the feature and complete my goal
  • Failure to complete my goal — my goal was to share control of the queue of music with my housemate and we did not succeed in doing this

Preliminary investigation

Before honing in on a focus for the evaluation and designing the approach, I first embarked on understanding the problem area more deeply.

As-is journey

spotify group session user journey
As-is user journey through feature

User survey

I created a survey aimed at gathering insight into the problem areas I had outlined.

user survey question flow
User survey question flow

User survey insights

Focusing on the refined audience, of the twenty-two participants that use Spotify premium, there were three key insights I took from the survey.


  • 64% aren’t aware of group sessions despite having access to it


  • 88% of people who are aware of group sessions have used it
  • This might show that discovering it led to use
  • This is also might show that there is demand for the feature and awareness is the problem

Goals and success

  • 71% of people who have used the group session feature wanted to share control of the queue
  • Users rated the overall experience at 3.14 out of 5

Evaluation focus

With the survey insights gathered and problems validated. I now had two key issues to build my tasks around and key questions to answer:

  • Awareness - why are users not aware of the group session feature?
  • Satisfaction - what is causing dissatisfaction with the experience?

Task definition

  • Host a group session
  • Join a group session
  • Share control of the queue functionality

Chosen methods

5x remote moderated task-based observational evaluation including:

  • Pre-test questionnaire that measures experience and diffusion of innovation model position
  • Three tasks while using Think Aloud
  • Post-test questionnaire including SUS (system usability scale) and qualitative feedback

Metrics to measure

Usability specification table

Evaluation overview

Pilot test learnings


  • Length: the pilot test took almost 45 minutes and I could feel my participant getting fatigued.
  • Context of task: the scenarios I provided gave context to the task but were difficult to digest and remember. I had to remind the participant and I could tell they struggled to understand.
  • Beta issues: some of the functionality I tried to test was not working reliably enough. Specifically queuing songs did not work correctly which caused delays in the evaluation.


  • Remove a step in the tasks: I removed the final step of adding a song to the queue in task 1 and 2.
  • Facilitator to read out post-test questionnaire: I decided I would read out the statements and questions from the post-test questionnaire in all future tests, to take some cognitive load from my participant.
  • Added images to task scenarios: to enhance the understanding of each scenario I added images taken from

Results, analysis, and recommendations

With my tasks refined and improvements made to the facilitation of the test, I completed the evaluation with five participants. I made use of Microsoft forms to record all data and excel to compile insights.

Normalising the SUS questionnaire

The post-test questionnaire section of my evaluation was the key stage for collecting results and insights. The questionnaire included the SUS statements and responses as well as a section qualitative feedback in the form of three open questions.

The results of the SUS questionnaire required normalising. I followed guidance from Jeff Sauro at that outlined the process of normalising SUS scores.

This process gave me the results in the standardised format and made further analysis and comparison easier. Some key insights I found:

  • Overall participants on average rated the experience 53.5 out of 100 on the system usability scale
  • The results ranged from 42.5 to 60

I cross-referenced these scores using details in the same guide used to normalise the scores. This guide details how scores can be graded upon a curve using a more familiar alphabetical system, from A-F (high to low).

system usability scale grading curve
SUS Grading Curve — source

I deduced that participants rated the experience with Spotify's group sessions in the task a a D-. A highly unsatisfactory score considering Spotfy’s prowess. But, as the SUS system is not diagnostic, further analysis is required to diagnose why the feature scored so low. This is where I looked at the qualitative feedback.

Analysis of qualitative feedback

I asked each participant the three questions and recorded their answers as notes. I then compiled the responses into the same spreadsheet as the SUS questionnaire.

excel sheet with data
Compiled data

To convert the qualitative feedback into more manageable and useful insight, I extracted the key points from each response. I then tallied up the mentions of each of these points across all five evaluations. I focused on the negatives outlined by looking at the questions, ‘what was your biggest pain point in the tasks?’ and ‘what would you change about the feature?’. This gave me insights on participant issues.

Participant issues

table of participant issues ranked by mention
Table of participant issues ranked by mention

Looking at the table there are two leading pain points and issues both mentioned by 4 out of 5 participants. Given the scope of the study I decided to initially focus on these two:

  • QR waveform scan
  • Joining screen

User interface analysis

With the most important problem areas identified, I set about analysing the interface where the issues occurred.

Invite/join screen

Invite/join screen analysis
Invite/join screen analysis

The above screen outlines what users are presented with when starting a session. This screen has the function of displaying who is currently listening as well as the ability to invite friends.

The invite functionality comes in the form of a button. The ‘invite friends’ button is what all of my participants used to complete task 2. Interestingly, the pink waveform image at the bottom of the screen is a scannable code that allows other uses to join. However, none of my participants used the scannable code. And when following up in the qualitative feedback 4 out 5 did not know that it was a scannable code.

This is interesting because the pink code is afforded a proportionally more significant amount of space compared to the ‘invite friends’ button. Though, it is clearly being missed by users. I have deduced three reasons for why it may be being overlooked:

Jakob’s law

“Users will transfer expectations they have built around one familiar product to another that appears similar”.

Spotify break Jakob’s law by implementing their own take of a quick response (QR) code. According to a survey by Statista an estimated 11 million households scanned a QR code in the US in 2020. Of an estimated 330 million people in America that is a small figure, even if we are generous towards our estimated amount of people per household. So we’re in a position where QR codes are still growing in use and becoming more ubiquitous, however, this data implies they are still not used by the majority

What’s more is Spotify’s version of the scannable code actually has closer similarities to a ‘waveform’ (a visual representation of music being played). This further plays into users expectations of what this scannable code is. It also justifies why, when tasked with ‘inviting a friend to the session, users do not notice the waveform, and instead click straight on the ‘invite friends’ button; despite the scanning option being a more direct path.

Traditional QR code (left) vs. Spotify’s scannable code (right)

Law of Prägnanz

“People will perceive and interpret ambiguous or complex images as the simplest form possible, because it is the interpretation that requires the least cognitive effort of us.”

The scannable waveform Spotify has implemented follows this law by paying close resemblance to a waveform. There is nothing in the shape to indicate that the code is scannable, therefore users recognise this as its most closely related shape and not as a scannable object. This causes a detrimental effect on the user experience as the functionality of the code is hidden.

(Traditional waveform vs. Spotify’s scannable code)
(Traditional waveform vs. Spotify’s scannable code)


Lastly, the aforementioned factors make it hard to distinguish the code as a scannable. However, aesthetically, and considering beyond usability factors, Spotify’s code fits well into their visual design patterns. The justification for not including a traditional QR code is somewhat understood. What is less justified I feel, is the lack of labelling that, if implemented, would likely mitigate risks of overlooking the code.


With the analysis in mind I have made a number of recommendations to improve the QR code scanning visibility and ease of use. To convey the ideas of these recommendations I have taken a screenshot of the interface and redesigned it to include my suggested improvements.

Recommendations designed
Invite/join screen — recommendations implemented


  • Add a label to the scannable code
  • Add a help (?) tooltip that opens a help wizard
  • Add a three-step help wizard to outline the process

Joining screen

Joining screen

The joining screen was a pain point for 4 out of 5 of my participants, with one citing “changing the wording of the joining screen would provide the most immediate improvement”. To understand where the issue lies with the joining screen I sought to understand the two options that you are presented with.

Both options allowed guests to share control of the queue and the functionality was identical when going down either route. However, by clicking 'Join on the same device as Host’ would continue playing audio from only the host’s device. By clicking ‘On my own device’ the music the host was listening to begins to play on the guests device. Essentially, the user is being asked ‘do you want the music to be played on your device or not?’.

Use cases

To elaborate further on these differences I have outlined some scenarios for each option below:

  • Join on same device as Host name — Useful if you are both hearing music from the same audio device
  • On my own device — Useful if you are listening to the same music but from different audio devices


Potential scenarios for group session users


The main issues I feel are present here are the fact that Spotify does not provide much detail on the scenarios or use cases for each configuration, and the group sessions feature in general. The vagueness and vast potential use cases make it harder to understand the two options you are presented with. But, because the main difference is where the audio is played, it makes sense for Spotify to remove the use case or scenario from the options and instead focus on the device where the music is played. However, the wording of these options does not provide clarity on what will happen and breaks some heuristics of UX writing.

Begin with the objective - when a sentence describes an objective and the action needed to achieve it, start the sentence with the objective

Nick Babich — 16 Rules of Effective UX Writing

Additionally, the modal that pops up lacks depth of information and detail. The space could be used more effectively to include diagrams or further description of what will happen. I found most of my participants paused for a long time when landing on the screen and the minimal detail likely made the experience slower due to trying to process the outcomes.


To convey the ideas of these recommendations I have redesigned the interface to include rough drafts of my recommendations implemented.

Joining screen recommendations implemented

Match existing visual language (left) in new icon (right)


  • Edit wording to relate to the objective and the key difference in the two options
  • Add additional levels of description to avoid the assumption the user will understand
  • Add diagrams to options to further illustrate the outcomes of the users choice

Further recommendations

Given a wider scope for improvements and analysis, I would look towards the next two items on the prioritised list of issues and pain points that participants felt:

  • Location of the feature
  • Facebook messenger quick share issue

Potential roadmap

Going forward if I were to continue from where I left off my main focus would be to repeat the evaluation, using the prototypes I designed. I would explore A/B testing and comparative methods to monitor any impact of my recommendations. This would help to validate the quality of the recommendations and help inform the decision on whether they should be implemented.

Reflection of evaluation

Upon reflection of my usability evaluation, there have been some key lessons I have taken from the process.

Think aloud

In planning the evaluation I chose to use the think aloud methodology while my participants carried out tasks. However, I found it difficult to write notes and at the same time as observing and the need for notes was reduced by the fact that I included some open-ended qualitative questions in my post-test questionnaire.

If I were to carry out the evaluation again, I would remove think aloud and instead create a more defined usability specification table with metrics for success rate and completion of the tasks. This would have provided more quantitative data which would have been helpful to cross-reference with findings on the participants' user groups and SUS scores.

Connecting user group data

A challenge I found in the process of the assignment was connecting the user grouping dataI gathered to the rest of the evaluation insights. I feel this was down to the mostly qualitative analysis making it laborious to find insights in any of the data, let alone linking it with previous data. The vision for the user grouping questions (user's experience with Spotify and position on the diffusion of innovation model) was to test whether past experience and attitude to innovation played a part in the usability of the feature. Despite only having five participants I would have liked to have explored this hypothesis further.

Thank you!

Thanks to the CI604 Usability Evaluation module leader, Sanaz Fallahkhair, as well as the student cohort for participating in the evaluation.

Scroll to top