WebRTC conferencing can be conducted in several ways, some of which include a central mixer that may function as an MCU (Multiparty Conferencing Unit) or as an SFU (Selective Forwarding Unit). In both of these cases there needs to be mixing of the audio, where the audio level will determine whether a specific client will be included in the mixed audio or not.
For better performance of the audio mixer the client should send the audio level of the media stream it is transmitting,
The audio mixer, on the other hand, needs to provide the client with the details of which parties are included in the mixed audio the mixer is sending to the conference participants.
Client side applications can use this information to enhance user experience and add more features such as participant tagging.
Conferencing applications can make use of audio level information for better audio mixing and improved conference experience.
This is in the WebRTC specification now.
When using WebRTC clients to interact with conferencing servers it is important both for the mixer to have information from the clients and for the clients to have information from the mixer. Here are the relevant discussions from the Seattle interim meeting and decisions that impact behavior and APIs:
Sending audio level to the conferencing server
The consequence for applications is that if your mixer can use WebRTC SDP the WebRTC client will automatically offer audio levels to the mixer. This is not yet in JSEP but will be shortly.
Extract mixed audio participant list
The information about which parties are included in the mixed audio is already being sent by the mixer to the WebRTC clients, but until now there was no API to get this information from the browser on the client side.
The decision was to add an API that will allow the WebRTC client to get from the mixer which other parties contributed to the mixed audio. The parties will be identified based on the CSRC or SSRC of the source.
There is now a new method on RTCRtpReceiver objects called getContributingSources() that returns a list of RtpContributingSource objects, each of which contains
- the CSRC or SSRC of the contributing source
- the timestamp of the most recent RTP packet containing the source
- if available, the audio level of that most recent packet
This information can allow applications to easily identify the current speaker and expose this in the user interface or use this information in other use cases such as recording and tagging.