We gave an earlier update briefly describing media capture from DOM elements, but it has made substantial progress since then.
Specification can be found here. Below are some details about the latest discussions.
getUserMedia() captures media from the camera and/or microphone. In this case there is no question whether there is a source of media to these devices. The camera can always capture video, and the mic can always capture audio. If the user physically blocks the camera or mic, either silence or a black/blank screen will be captured.
In an HTML video element there sometimes may not be a source connected to it, or the video might be paused. The question was what to do in such cases. The existing specification says that the output will only have tracks in a stream when the corresponding input tracks exist and are rendering, but the thinking now is that the output needs to match the user experience as closely as possible. This means, for example, that when video playback is paused there would be a blank output saved during that time, allowing a replay of the captured stream to match the user experience.
The plan is to have a single new method, captureStream(), on HTML media elements such as <audio>, <video>, and <canvas> that will produce a real-time MediaStream just as getUserMedia() does, meaning that it can be rendered, sent over a Peer Connection, etc.
It will be easier to capture the actual experience a user has from audio, video, and canvas elements. Overall this will be better for WebRTC screen sharing and whiteboard -type applications.
The updates listed above are officially still under discussion but will likely go into the specification fairly soon.