1. Introduction
This section is non-normative.
Media is used extensively today, and the Web is one of the primary means of consuming media content. Many platforms can display media metadata, such as title, artist, album and album art on various UI elements such as notification, media control center, device lockscreen and wearable devices. This specification aims to enable web pages to specify the media metadata to be displayed in platform UI, and respond to media controls which may come from platform UI or media keys, therefore improves the user experience.
2. Conformance
All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can’t change the behavior by overriding attributes or methods with custom properties or functions in JavaScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
3. Dependencies
The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]
4. The MediaSession
interface
[Exposed=(Window)] partial interface Navigator { readonly attribute MediaSession mediaSession; }; enum MediaSessionPlaybackState { "none", "paused", "playing" }; enum MediaSessionAction { "play", "pause", "seekbackward", "seekforward", "previoustrack", "nexttrack", }; callback MediaSessionActionHandler = void(); [Exposed=Window] interface MediaSession : EventHandler { attribute MediaMetadata? metadata; attribute MediaSessionPlaybackState playbackState; void setActionHandler(MediaSessionAction action, MediaSessionActionHandler? handler); };
A MediaSession
objects represents a media session for a given document and
allows a document to communicate to the user agent some information about the
playback and how to handle it.
A MediaSession
has an associated metadata object represented by a MediaMetadata
. It is initialy null
.
The mediaSession
attribute
MUST return the MediaSession
instance associated with the Navigator
object.
The metadata
attribute
reflects the MediaSession
's metadata. On getting,
it MUST return the MediaSession
's metadata. On
setting it MUST run the following steps with value being the new
value being set:
- If the
MediaSession
's metadata is notnull
, set its media session tonull
. - Set the
MediaSession
's metadata to value. - If the
MediaSession
's metadata is notnull
, set its media session to the currentMediaSession
. - In parallel, run the update metadata algorithm.
The playbackState
attribute represents the playback state of the media session. The default value is none. On setting, the user agent MUST
update the active media session and run the media session actions
update algorithm if needed. On getting, the user agent MUST return the
last valid value that was set. If none, it MUST return none.
The MediaSessionPlaybackState
enum is used to indicate the playback
state, the values are described as follows:
- none means the page does not specify whether it’s playing or paused.
- playing means the page is currently playing media and it can be paused.
- paused means the page has paused media and it can be resumed.
The setActionHandler() method, when
invoked, MUST run the update action handler algorithm with action and handler on the MediaSession
.
5. The MediaMetadata
interface
[Constructor(optional MediaMetadataInit init)] interface MediaMetadata { attribute DOMString title; attribute DOMString artist; attribute DOMString album; attribute FrozenArray<MediaImage> artwork; }; dictionary MediaMetadataInit { DOMString title = ""; DOMString artist = ""; DOMString album = ""; sequence<MediaImageInit> artwork = []; };
A MediaMetadata
object is a representation of the metadata associated with
a MediaSession
that can be used by user agents to provide customized user
interface.
A MediaMetadata
can have an associated media
session.
A MediaMetadata
has an associated title, artist and album which are DOMString.
A MediaMetadata
has an associated artwork images which is a FrozenArray of MediaImage
s.
A MediaMetadata
is said to be an empty metadata if it is equal
to null
or all the following conditions are true:
- It’s title is the empty string.
- It’s artist is the empty string.
- It’s album is the empty string.
- It’s artwork images length is
0
.
The MediaMetadata(init)
constructor, when invoked, MUST run the following steps:
- Let metadata be a new
MediaMetadata
object. - Set metadata’s
title
to init’stitle
. - Set metadata’s
artist
to init’sartist
. - Set metadata’s
album
to init’salbum
. - Set metadata’s
artwork
using init’sartwork
by calling theMediaImage(init)
constructor. - Return metadata.
The title
attribute
reflects the MediaMetadata
's title. On getting,
it MUST return the MediaMetadata
's title. On
setting, it MUST set the MediaMetadata
's title to
the given value.
The artist
attribute
reflects the MediaMetadata
's artist. On getting,
it MUST return the MediaMetadata
's artist. On
setting, it MUST set the MediaMetadata
's artist to the given value.
The album
attribute
reflects the MediaMetadata
's album. On getting,
it MUST return the MediaMetadata
's album. On
setting, it MUST set the MediaMetadata
's album to
the given value.
The artwork images
attribute reflects the MediaMetadata
's artwork
images. On getting, it MUST return the MediaMetadata
's artwork images. On setting, it MUST set the MediaMetadata
's artwork images to the given
value.
When MediaMetadata
's title, artist, album or artwork images are modified, the user agent MUST run the
following steps:
- If the intance has no associated media session, abort these steps.
-
Otherwise, queue a task to run the following substeps:
- If the instance no longer has an associated media session, abort these steps.
- Otherwise, in parallel, run the update metadata algorithm.
6. The MediaImage
interface
[Constructor(optional MediaImageInit init)] interface MediaImage { readonly attribute USVString src; readonly attribute DOMString sizes; readonly attribute DOMString type; }; dictionary MediaImageInit { USVString src = ""; DOMString sizes = ""; DOMString type = ""; };
A MediaImage
object has a source, a list of sizes, and a type.
The MediaImage(init)
constructor, when invoked, MUST run the following steps:
- Let metadata be a new
MediaImage
object. - Set metadata’s
src
to init’ssrc
. If the URL is a relative URL, it MUST be resolved to an absolute URL using the document base URL. - Set metadata’s
sizes
to init’ssizes
. - Set metadata’s
type
to init’stype
. - Return metadata.
The MediaImage src, sizes and type inspired from the image objects in Web App Manifest.
The src attribute MUST return the MediaImage
object’s source. It is a URL from which
the user agent can fetch the image’s data.
The sizes attribute MUST return the MediaImage
object’s sizes. It follows the spec of sizes
attribute in HTML link
element, which is a string
consisting of an unordered set of unique space-separated tokens which are ASCII case-insensitive that represents the dimensions of an image. Each
keyword is either an ASCII case-insensitive match for the string "any",
or a value that consists of two valid non-negative integers that do not have a
leading U+0030 DIGIT ZERO (0) character and that are separated by a single
U+0078 LATIN SMALL LETTER X or U+0058 LATIN CAPITAL LETTER X character. The
keywords represent icon sizes in raw pixels (as opposed to CSS pixels). When
multiple image objects are available, a user agent MAY use the value to decide
which icon is most suitable for a display context (and ignore any that are
inappropriate). The parsing steps for the sizes attribute MUST follow the parsing steps for HTML link
element sizes
attribute.
The type attribute MUST return the MediaImage
object’s type. It is a hint as to the
media type of the image. The purpose of this attribute is to allow a user agent
to ignore images of media types it does not support.
7. Media Controls
A media session action is an action that the page can handle in
order for the user to interact with the MediaSession
. For example, a page
can handle some actions that will then be triggered when the user presses
buttons from a headset or other remote device.
A media session action source is a source that might produce a media session action. Such source can be the platform or the UI surfaces created by the user agent.
A media session action is represented by a MediaSessionAction
which
can have one of the following value:
- play: the action intent is to resume the playback. The current playback state should not be playing.
- pause: the action intent is to pause a currently active playback. The current playback state should be playing.
- seekbackward: the action intent is to move the playback time backward by a short period (eg. few seconds).
- seekforward: the action intent is to move the playback time forward by a short period (eg. few seconds).
- previoustrack: the action intent is to either start the current playback from the beginning if the playback has a notion of beginning or move to the previous item in the playlist if the playback has a notion of playlist.
- nexttrack: the action is to move to the playback to the next item in the playlist if the playback has a notion of playlist.
All MediaSession
have a map of supported media session actions with, as a key, a media session action and as a value a MediaSessionActionHandler
.
When the update action handler algorithm on a given MediaSession
with action and handler parameters is
invoked, the user agent MUST run the following steps:
- If handler is
null
, remove action from the supported media session action forMediaSession
and abort these steps. - Add action to the supported media session actions for
MediaSession
and associate to it the handler.
When the supported media session actions are changed, the user agent SHOULD run the media session actions update algorithm. The user agent MAY queue a task in order to run the media session actions update algorithm in order to avoid UI flickering when multiple actions are modified in the same event loop.
When the user agent is notified by a media session action source that a media session action named action has been trigerered, the user agent MUST run the handle media session action steps as follow:
- If the active media session is
null
, abort these steps. - Let actions be the active media session’s supported media session actions.
- If actions does not contain the key action, abort these steps.
- Let handler be the
MediaSessionActionHandler
associated with the key action in actions. - Run handler.
When the user agent receives a joint command for play and pause, such as a headset button click, it MUST run the following steps:
- If the active media session is
null
, abort these steps. - Let action be a media session action.
- If the actual playback state of the active media session is playing, set action to pause.
- Otherwise, set action to play.
- Run the handle media session action steps with action.
It is RECOMMENDED for user agents to implement a default handler for the play and pause media session actions if none was provided for the active media session.
A user agent MAY automatically pause any audible player after a pause media session action has been handled by the page.
A page should only register a MediaSessionActionHandler
for a media
session action when it can handle the action given that the user agent
will list this as a supported media session actions and update the media session action sources.
8. Processing model
8.1. Determining the actual playback state
The playbackState attribute is to let the page specify the current playback state. However the playback state is a hint from the page and and MAY be overriden by the user agent. The state after user agent override is called actual playback state.
The actual playback state SHOULD return the last value that was set on playbackState. If the user agent believes the actual playback state is playing and the playbackState returns a different value, it MAY return playing instead.
When the actual playback state is updated, the user agent agent SHOULD run the media session actions update algorithm.
8.2. Media session routing
There could be multiple MediaSession
objects existing at the same time since
the user agent could have multiple tabs, each tab could contain a top-level
browsing context and multiple nested browsing contexts, and each browsing context could have a MediaSession
object.
The user agent MUST select at most one of the MediaSession
objects to
present it to the user, which is called the active media session. The active media session may be null. The selection is up the user agent and
SHOULD base on preferred user experience.
It is RECOMMENDED that the user agent selects the active media session by managing audio focus. A tab or browsing context is said to have audio focus if it is currently playing audio or the user expects to control the media in it. The AudioFocus API targets at this area and could be used once it’s finished.
The user agent SHOULD present the metadata
of the active media session to the platform for display purpose. This MUST not be done for all other MediaSession
instances.
Whenever the active media session is changed, the user agent MUST run the media session actions update algorithm.
8.3. Processing metadata
The media metadata for the active media session MAY be displayed in the
platform UI depending on platform conventions. Whenever the active media session changes or setting metadata
of the active media session,
the user agent MUST run the update metadata algorithm. The steps are
as follows:
- If the active media session is null, unset the media metadata presented to the platform, and terminate these steps.
- If the
metadata
of the active media session is an empty metadata, unset the media metadata presented to the platform, and terminate these steps. - Update the media metadata presented to the platform to match the
metadata
for the active media session. - If the user agent wants to display artwork image, it is RECOMMENDED to run the fetch image algorithm.
The RECOMMENDED fetch image algorithm is as follows:
- If there are other fetch image algorithm running, cancel existing algorithm execution instances.
- If metadata’s
artwork
of the active media session is empty, then terminate these steps. - If the platform supports displaying media artwork, select a preferred
artwork image from metadata’s
artwork
of the active media session. -
Fetch the preferred artwork image’s
src
.Then, in parallel:
- Wait for the response.
- If the response’s internal response’s type is default, attempt to decode the resource as image.
- If the image format is supported, use the image as the artwork for display in platform UI. Otherwise the fetch image algorithm fail and terminate.
If no artwork images are fetched in the fetch image algorithm, the user agent MAY have fallback behavior such as displaying an default image as artwork.
8.4. Processing media session actions
When the media session actions update algorithm is invoked, the user agent MUST run the following steps:
- Let available actions be an array of media session actions.
- If the active media session is null, set available actions to the empty array.
- Otherwise, set the available actions to the list of keys available in the active media session’s supported media session actions.
-
For each media session action source source, run the
following substeps:
-
Optionally, if the active media session is not null:
- If the active media session’s actual playback state is playing, remove play from available actions.
- Otherwise, remove pause from available actions.
- If the source is a UI element created by the user agent, it MAY remove some elements from available actions if there are too many of them compared to the available space.
- Notify the source with the updated list of available actions.
-
Optionally, if the active media session is not null:
9. Examples
This section is non-normative.
window.navigator.mediaSession.metadata = new MediaMetadata({ title: "Episode Title", artist: "Podcast Host", album: "Podcast Title", artwork: [{src: "podcast.jpg"}] });
Alternatively, providing multiple artwork images in the metadata can let the user agent be able to select different artwork images for different display purposes and better fit for different screens:
window.navigator.mediaSession.metadata = new MediaMetadata({ title: "Episode Title", artist: "Podcast Host", album: "Podcast Title", artwork: [ {src: "podcast.jpg", sizes: "128x128", type: "image/jpeg"}, {src: "podcast_hd.jpg", sizes: "256x256"}, {src: "podcast_xhd.jpg", sizes: "1024x1024", type: "image/jpeg"}, {src: "podcast.png", sizes: "128x128", type: "image/png"}, {src: "podcast_hd.png", sizes: "256x256", type: "image/png"}, {src: "podcast.ico", sizes: "128x128 256x256", type: "image/x-icon"} ] });
For example, if the user agent wants to use an image as icon, it may choose "podcast.jpg"
or "podcast.png"
for a
low-pixel-density screen, and "podcast_hd.jpg"
or "podcast_hd.png"
for a high-pixel-density screen. If the user
agent want to use an image for lockscreen background, "podcast_xhd.jpg"
will be preferred.
For playlists or chapters of an audio book, multiple media elements can share a single media session.
var audio1 = document.createElement("audio"); audio1.src = "chapter1.mp3"; var audio2 = document.createElement("audio"); audio2.src = "chapter2.mp3"; audio1.play(); audio1.addEventListener("ended", function() { audio2.play(); });
Because the session is shared, the metadata must be updated to reflect what is currently playing.
function updateMetadata(event) { window.navigator.mediaSession.metadata = new MediaMetadata({ title: event.target == audio1 ? "Chapter 1" : "Chapter 2", artist: "An Author", album: "A Book", artwork: [{src: "cover.jpg"}] }); } audio1.addEventListener("play", updateMetadata); audio2.addEventListener("play", updateMetadata);
var tracks = ["chapter1.mp3", "chapter2.mp3", "chapter3.mp3"]; var trackId = 0; var audio = document.createElement("audio"); audio.src = tracks[trackId]; void updatePlayingMedia() { audio.src = tracks[trackId]; // Update metadata (omitted) } window.navigator.mediaSession.onprevioustrack = function() { trackId = (trackId + tracks.length - 1) % tracks.length; updatePlayingMedia(); } window.navigator.mediaSession.onnexttrack = function() { trackId = (trackId + 1) % tracks.length; updatePlayingMedia(); }
Acknowledgments
The editor would like to thank Paul Adenot, Jake Archibald, Tab Atkins, Jonathan Bailey, Marcos Caceres, Domenic Denicola, Ralph Giles, Anne van Kesteren, Tobie Langel, Michael Mahemoff, Jer Noble, Elliott Sprehn, Chris Wilson, and Jörn Zaefferer for their participation in technical discussions that ultimately made this specification possible.
Special thanks go to Philip Jägenstedt and David Vest for their help in designing every aspect of media sessions and for their seemingly infinite patience in working through the initial design issues; Jer Noble for his help in building a model that also works well within the iOS audio focus model; and Mounir Lamouri and Anton Vayvod for their early involvement, feedback and support in making this specification happen.
This standard is written by Rich Tibbett (Opera, richt@opera.com).
Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.