Streaming is available in most browsers,
and in the WWDC app.
-
Design high quality Siri media interactions
Demystify the art of designing Siri experiences for your music and audio apps: We'll show you how to think about crafting great interactions and how you can provide custom vocabulary so that Siri can respond with more accuracy and personality. We'll also explain how you can debug common errors and test your intents using the same methods Apple's own Siri team employs.
Resources
Related Videos
WWDC 2020
- Create quick interactions with Shortcuts on watchOS
- Decipher and deal with common Siri errors
- Design for intelligence: Apps, evolved
- Design for intelligence: Discover new opportunities
- Design for intelligence: Make friends with "The System"
- Design for intelligence: Meet people where they are
- Empower your intents
- Evaluate and optimize voice interaction for your app
- Expand your SiriKit Media Intents to more platforms
- Feature your actions in the Shortcuts app
- Integrate your app with Wind Down
- Thursday@WWDC
- Wednesday@WWDC
- What's new in SiriKit and Shortcuts
Tech Talks
WWDC 2019
-
Download
Hello, and welcome to WWDC. Hi, I'm Danny Mandel, and I'm here to talk about how we make sure your SiriKit Media apps have the highest-quality Siri experience possible.
Why do we care about quality? I think we all want to build things people will enjoy using, and nobody enjoys using bad voice implementations. Additionally, the trust barrier for voice assistants is even higher than with a traditional user interface. So the reliability needs to be even better to maintain usage. If you're going to take the time to build SiriKit Media support, take the time to make it good.
This might sound kind of silly, but the single most important thing you can do is play something when someone asks to play.
And think about it. If the first time someone is excited to use your app with voice and nothing plays, they probably won't ask again. So having a robust playback stack is the first place you're going to want to invest your Siri engineering resources.
And the next thing we want to do is make sure that we start playback quickly.
Over the past year, we've learned that one of the single biggest failure cases in SiriKit Media apps are timeouts.
Particularly in environments like CarPlay, starting playback quickly is really important. In cases like CarPlay, when you're hands-free on the road, we can be more aggressive with timeouts, and your app will get killed off if it doesn't start playback quickly. To help, we added a couple new options for performance enhancements this year, so make sure to check out "Expand your SiriKit Media Intents to More Platforms" for more details.
Another way to make a high-quality experience is to let Siri understand your listeners' preferences. The way you do this is by adopting the Siri user vocabulary API.
Similarly, you can help Siri to understand your app's catalog by adopting Siri's global vocabulary API. We'll do a deep dive on both of these topics a little later in this talk.
When you're choosing the perfect thing to play, you're going to want to allow people to ask in different ways. The more utterances you can support in your app, the more likely people are going to want to use it in their everyday lives. After all, the promise of Siri is that it's an intelligent assistant. And it's only truly intelligent if we support a wide variety of natural-language utterances. So let's take a look at some of the most common natural-language patterns we see across all SiriKit Media apps. What do we mean by perfect-ish? Ultimately, perfect-ish is the best guess when you don't know what someone wants.
And we see greater than half of all SiriKit Media requests follow this kind of pattern. Requests as simple as "play your app" or "play music on your app." And that makes this generic case, when someone doesn't tell you specifically what they actually want, the single most important use case to get right. By making sure you handle this scenario, you can service a huge amount of listeners off the bat. It's up to you to decide what perfect-ish means for your app. But make sure it does something that will give people that classic surprise and delight.
The way you'll know someone asked for these very generic cases is you'll either get a nil media search or a media search containing a media type of music. Our next important use case is the "play something" use case, where people specify the title of what it is they want to play. They won't say what the media type is, so it could be a song, album, artist, podcast, you won't know. It's important that when you implement this case, you accommodate for different media types and execute a very broad search.
The way you'll recognize this kind of request is that there will only be a media name populated in the media-search object. We see about 30% of queries like this. Moving down the list, we start to see more precise queries, with a combination of artist and some other search field. So make sure you support these compound searches with artists. In this case, you'll get a populated media name with the title of what someone asked for, and then also an artist's name with the artist they're looking for. As we start to get more specific in the query types, we do see the usage start to go down and see usage of this pattern at around five percent of requests. A final category where we see a lot of usage is playlists. So make sure you support playlist searching. Again, this is one of the more specific queries, and we do see its usage at around five percent of the time. When someone includes a playlist query in their utterance, we'll get the media-name property populated with the playlist query and the media-type property will be set to "playlist." The list does continue, as there's a number of other search fields in INMediaSearch. But you can see with just these four use cases, we've captured more than 90% of the current SiriKit Media traffic. Please do implement as many other fields as makes sense for your app. But it does make sense to prioritize your engineering and QA around the popular use cases, as these are what people are most likely to use. And one final note on this. We've also seen that the better your Siri support is, the more likely it is that people will make more complicated Siri requests. So as you make it better, expect the usage patterns to change. That's a good thing. That means they like using your app with Siri. We just looked at the high-traffic utterances and saw how we can capture the vast majority of usage with just a few intents. So let's look at those intents in the debugger, and see what the media search looks like in each case.
All right, we're gonna put a breakpoint here, so we can see what the media search looks like.
The first utterance we'll look at is the empty play case. "Play ControlAudio." All right. Looking at our breakpoint, we can see that in this case, there is no media search. This makes sense because we didn't specify any search criteria. Now let's look at "play music in ControlAudio".
In this case, we can see that we have the music media type set,

