How to design voice user interfaces

There are more and more voice-controlled devices, such as Apple HomePod, Google Home and Amazon Echo. This is because voice interfaces improve user experience, in fact, comScore (a marketing research company) believes that voice will execute 50% of all searches by 2020.

The five largest technology companies, Microsoft, Google, Amazon, Apple and Facebook, have developed (or are currently developing) voice-enabled AI assistants.

Whether we are talking about VUI (Voice User Interfaces) for web applications or smart home speakers, voice interactions are increasingly common in today's technology, especially since many people experience fatigue from being many hours in front of a screen.

So, let's see how to design voice user interfaces and what anatomy voice commands should have.

How to design voice interfaces

If you read online reviews of home speakers, you will notice that some people form a close link with their speaker in a way that looks more like a pet than a product.

You certainly cannot meet all customer expectations with programs that are still in full progress, but you can follow some guidelines as a starting point.

Provide users with information on what they can do

A graphical user interface shows users everything they can do. A voice interface has no way of showing the user what options are possible, and new users base their expectations on their experience with human conversations.

Therefore, they can start by asking for something that makes no sense to the system or that is not possible. The solution to this is to offer the user interaction options.

For example, the voice interface can say something like "I can help you buy" or "I can give you information about products".

In any case, users should also be provided with an easy way out of a feature, including 'exit' as one of their options.

Deliver answers with complete information

In the graphical interfaces users can see in which section they are, while in the voice user interfaces, users must know what functionality they are using.

Users can quickly get confused about where they are in the conversation or can activate functionality by mistake, therefore, without more guidance than the auditory, they need more details.

The interface must answer a question about a product with an entire phrase such as "The car of brand X and model X is priced at $ 20,000, and is guaranteed for 2 years."

This allows users to know what functionality they are using, and what the speaker is talking about.

Use as many examples as you can

When people speak, they often do not express their full intentions, many times because they use slang, muffins, shorten words, etc.

Among humans we understand each other, but voice interfaces need the human to express himself correctly to understand his intentions.

In addition, the more information about their intentions includes a user in a sentence, the better.

A user can ask: "Give me information about the available cars, the price of the X model please" and get the information they want immediately, instead of saying first: "I want information about the available cars" and then asking for the model.

Users may not realize this way of operating, so you should use as many examples of interactions as possible.

Limit the amount of options

When users browse visual content or lists, they can return to the information they overlooked or forgot.

That is not the case with verbal content. With verbal content, sentences should be kept short.

It is recommended that more than three different options be used for an interaction.

Inform the user that you are listening

Use some form of simple feedback so that the user knows that the system is listening.

You should use some visual information so that users know that the voice user interface is listening, the user can immediately see that what he is saying is being recorded (similar to when we talk to other people and by nonverbal communication you can see they are listening).

Anatomy of voice commands

Before a flow of dialogue can be created, designers must first understand the anatomy of a voice command. A user's voice command consists of three key factors: intention, declaration and space. Let's analyze the following request: "Play relaxing music to sleep".


Intention represents the broader objective of a user's voice command. In the example request, the intention is clear, the user wants to listen to music.


How the user formulates the command, that is, the statement. In the example used we know that the user wants to play relaxing music thanks to the word "play", but this is not the only way to say it. The user can also say "I want to listen to music".

Conversation designers must take into account all variations of the expression.


Sometimes, an intention alone is not enough and more user information is required to fulfill the request.

This is called a "context," and these, as in visual interfaces, may be optional or required to complete an application.

In this case, the word "relaxing" could use the context "play music" to know that you are requesting to play relaxing music.

To design excellent voice user interfaces, you must find an elegant way to provide users with relevant information without overwhelming them.

Voice user interaction can pose more challenges in some aspects than a visual system, however, there is no doubt that it is a mode of interaction that will be increasingly used.

Would you like to know more?

Get more information