Simulcast and sending multiple video resolutions to the SFU

 Highlights

 Impact on my application

 Standardization status

 Details

 

 Highlights

In a WebRTC session it is possible to open more than one video stream to a single destination. The use cases for this can be when there is more than one camera or video source. Another use case is for conferencing. When using an SFU (Selective Forwarding Unit), many times all or some of the clients will send 2 video streams to the SFU where one is high resolution while the other is low resolution. This enables the SFU to forward to other participants the low resolution of that participant when he is not an active speaker and the higher resolution when he is in the center of the session.

Naturally the actual implementation of this is more complex as it has bandwidth implications, so the SFU will want to be selective and make smart decisions as to which resolution stream to send to the other clients..

Until now it was possible to open multiple m-lines of the same media type (e.g. video) in SDP but there was no way to say they belonged to the same source.

This has now been taken care of in the standards.

 

 Impact on my application

Important especially when working in multi-participant scenarios

 

 Standardization status

In progress. Should be in the specification shortly.

 

 Details

Simulcast is a feature of many current conferencing systems that, until now, has not been cleanly and uniformly designed for use in WebRTC.  Although there are often disagreements about exactly how to define simulcast, for the purposes of the IETF and WebRTC it boils down to sending the same video content at different resolutions and/or frame rates.  One common use for simulcast is for a conference client to send both a thumbnail video of the conference participant and a full-scale video to a selective forwarding unit (SFU).  Then the SFU can forward whichever of the two streams is needed at the moment.
The primary issue holding up support in WebRTC is the fact that how simulcast is signaled in SDP, particularly in the presence of BUNDLE and RTCP-MUX, had not been worked out in the IETF.  The recent change is a proposal from Google for a new identifier in SDP, the RTP source stream ID, that can be used to distinguish among a set of video attribute lines.
At the SDP level, simulcast will be signaled as follows:

m=video 10000 RTP/SAVPF 98
a=rtpmap:98 VP8/90000

a=rid:1 send max-­‐width=1280;max-­‐height=720;max-­‐fps=30
a=rid:2 recv max-­‐width=1280;max-­‐height=720;max-­‐fps=30
a=rid:3 send max-­‐width=320;max-­‐height=180;max-­‐fps=15
a=simulcast:send rid=1;3 recv rid=2

In this example there are two send resolutions and frame rates and one receive stream.

With the new RtpTransceiver paradigm, it is now relatively easy to gain access and control at the JavaScript API level.  Each RtpTransceiver will now have a sendEncodings ‘rid’ property whose value corresponds to the SDP rid.  From the specification:

        if (stream.getVideoTracks().length > 0) {
            pc.addTransceiver(stream.getVideoTracks()[0], {
                send: true,
                receive: false,
                sendEncodings: [
                    {
                      rid: “f”,
                    },
                    {
                      rid: “h”,
                      scaleDownResolutionBy: 2.0
                    },
                    {
                      rid: “q”,
                      scaleDownResolutionBy: 4.0
                    }
                ]
            });
          }

This example also shows a new API feature, ‘scaleDownResolutionBy’, which shows the factor the video should be scaled down by relative to the max resolution negotiated.  With this capability the JavaScript developer doesn’t need to know the actual resolutions being used but can indicate their relative values. In this example the ‘f’ resolution would be the full resolution, while the ‘h’ would be half-resolution and ‘q’ would be quarter-resolution.