Total Pageviews

Thursday, 19 October 2017

WebRTC让用户可以在电脑或手机上使用浏览器打音频/视频电话

由Google推出的WebRTC让用户可以在电脑或手机上使用浏览器打音频/视频电话——这让电信公司以及现在的VoIP运营商都头疼不已。VisionMobile客座作家Tsahi Levent-Levi在本文讨论了Google的意图以及电信公司和跨设备通讯公司(OTT players)所将要面临的问题。
resources for getting started are available from webrtc.org/start.
对于网络运营商来说,这几年本已十分艰难,因为它们一直在和多设备通讯公司(OTT players)如Skype和 WhatsApp进行竞争。这些公司在电信运营商所提供的网络上提供诸如语音和短信服务。这些OTT公司的影响不可小觑——不管是像Viber(有超过9千万用户,每月超过150亿通电话,发出超过20亿条短信)这样的小虾米还是像Apple(Apple的iMessage有多达1.4亿用户,每天通过iMessage发出十亿信息)这样的大鳄。
现在,让电信运营商更头疼的东西出现了:WebRTC
WebRTC是一种可以让用户通过网页建立实时通信的技术。这种技术不仅会影响网络电信——那些OTT运营商现在也面临着一个真正的威胁了,因为WebRTC降低了不同OTT的订阅壁垒。
现在,如果使用OTT服务,如果不安装相关软件,是不可能和某人实时交流的。此外,你不能跨平台通信,比如在Viber平台呼叫一个Skype用户。
WebRTC将会改变一切
如何改变?WebRTC将可用让任何浏览器都能使用VoIP应用,因为它将会被集成到HTML5标准中。你将不再需要一个Skype账号、电话号码或者电子邮件地址等等——由于Google,即时通信可用直接在浏览器中进行,而不必订阅任何服务。
Google并购了Global IP Solutions (GIPS)公司,该公司为包括Yahoo 和 Skype公司在内的想要开发VoIP应用的公司提供语音及视频媒体引擎。GIPS通过提供实时多媒体部分,让应用即时可用,减少了Google的很多工作量。Google并没有停止进一步使用其技术——它现在正在通信领域使用该技术加强商业竞争力并让浏览器更加复杂化。
Google做了些什么
Google面向web浏览器开发者,将GIPS封装到一些Java Script APIs中,创建了WebRTC这意味着VoIP技术将可以为百万开发者所使用。 Google开源了WebRTC,将其置于宽松的BSD证书下——这使得该技术可以被重用、修改并衍化开发;使得该技术脱离了实时媒体工程师的控制,让类似于Spirit-DSP(译者注:SpiritDSP是一家提供音视频方案的公司)这样的竞争者边缘化。 Google将该技术提交给W3C  IETF标准机构进行标准化,确保该技术成为浏览器中的通用部件,并在这个过程中,去除该技术中任何与Google相关的部分。 -它忽略了拨号层,让开发商可以在任何实时通信环境下使用WebRTC,而不必考虑是使用何种协议建立信号通信。
Google决策背后的策略
这是Google常用的经典的“补充”经济策略(译者注:利用补充产品促进主产品的推广),它将会改变整个通信服务领域的面貌,对于电信运营商和OTT运营商来说都是如此。
WebRTC是在web浏览器中实现实时通信的技术,它是Google整体策略中的重要部分,因为它让众多开发者可以免费使用WebRTC技术,降低了开发丰富通信应用的壁垒,这些web开发者可以将语音和视频服务带入全新的应用领域,丰富通信内容,并让开启自己的VoIP应用变得前所未有的简单
对于Google来说,这一决定仅仅是为了加强Web以及web浏览器,减少与本地应用的差距,不管是在台式机还是移动领域。对于Google来说,该项举措的真正价值在于可以投放更多广告,能够在更深程度上挖掘用户浏览行为——这些正是Google所看重的。这一行动将削弱收购了Skype的微软,威胁Apple的 FaceTime。
通常的OTT商业模式
OTT提供商主要是将他们的策略集中于尽可能获取更多用户,提供给他们诱人的免费服务,让用户形成依赖,随后试图通过以下四种方式获取收益:
    1. 来自于ooVoo、Skype等的广告
    2. 到公用电话网(PSTN)的连接(Skype的大部分盈利都来源于到PSTN的连接以及电信运营商的电话编号方案)
    3. 增值服务,比如多点视频呼叫(由ooVoo完成)
    4. 通过收购兑现(这正是Viber希望做到的)
OTT运营商通过使用大用户量获取利润,因此,他们希望将用户限定在自己的服务边界以内,而不会让用户和其他OTT运营商进行交互(比如如果从Viber向Skype呼叫,是无法做到的)。
再见,隔离墙——迎接全新的沟通方式
WebRTC打破了OTT运营商的高墙,使得客户端或者用户ID(比如skype ID或者邮件地址)都不再是必须的。由于不在存在具体的拨号,每个运营商都可以决定是否(以及如何)使用用户的ID。
-          假设由一个本地的保险代理正在巴黎寻找可能的新客户:他建立了一个网站,并在AdWords上投钱,以吸引流量到他的销售渠道,并将这些流量引入到一个联系页面——或者是一个电话号码。如果是使用WebRTC的话,他就可以开始和正在家里访问他的网站的用户直接通过浏览器进行交流——不管他现在在哪里,而不需要OTT运营商的支持。
-          或者是一个针对背包客的新奇的社交网站,试图将正在计划旅行的用户联系起来。他们不需要交互用户ID或者电话号码,或者安装什么客户端——只需要点击一个按钮,他们就可以直接通过这个社交网站本身进行联系了。
现在已经有初创公司使用WebRTC提供服务了,其中包括BistriCloudeoFrisBTenHands以及TokBox
现在存在一种通过现有社交媒体账号在web上注册新服务的模式,很多将要采用WebRTC的投资商将会喜欢这一模式,因为这样就不再需要具体的服务ID。
信运营商该如何应对?
对于电信运营商来说,WebRTC到底是一个威胁还是机会?都可以——这取决于运营商如何应对。
WebRTC确实扰乱了电信运营商的通信服务,但是,与此同时,WebRTC也带来了可观的机会。然而,为了要抓住这些计划,运营商们需要拥抱web开发者社区,向基于WebRTC的应用传递价值及服务,以便在这个充满活力的生态系统中取得一席之地。Web开发者以及在寻找可用放到自己的应用中去的WebRTC解决方案了。电信运营商也可以成为创新的载体,只要他们提供:
-          针对WebRTC的基于会话的收费(session-based charging。和任何其他的电信服务一样,他们可以向用户发起的WebRTC会话收费:通过电信网络的WebRTC会话可以被追踪(通过DPI以及其他方式),从而可以据此收费。
-          RCSWebRTC合并。RCS(也被称为Joyn),是电信运营商的即时通信工具。通过添加WebRTC到RCS中,它可以在不使用诸如VoLTE等额外的协议的情况下拥有即时可用的可编程多媒体能力。
-          服务质量保障。需要报警?有其他紧急情况?有商务电话?电信运营商可以保证这些呼叫的服务质量,并给予一定的优先权(当然,是在收取一定费用的情况下)。
-          架构。WebRTC只是一个协议——要想在这个协议之上构建解决方案需要很多其他附加组件,这些组件大多数是服务器端的。电信运营商可以提供服务端架构,将其作为服务提供给用户。
-          PSTN的连接。电信运营商有他们自己已有的语音通信网络,以及到PSTN线路的链接。他们可以提供到PSTN和GSM的WebRTC终端,消弭这些语音服务间的缝隙。
-          WebRTC拨号。WebRTC只提供了媒体组件,而没有拨号,但你想要通过WebRTC联系到一个人你还是需要拨号的。这就是电信运营商可以做的地方了——它可以为用户提供连接。
不管电信运营商们有多么渴望回到OTT出现之前的黄金年代,这已经是不可能了,但WebRTC可以让电信运营商停止没落之势,甚至可以稍微扳回一点,这就要看他们的行动有多快多彻底了。AT&T, T-Mobile, Deutsche Telekom 以及 Orange是主流电信运营商的例子,他们已经意识到这一点,并迅速对WebRTC提供的机会进行投资。问题是,其他人要跟上还要多久呢?
作者简介:
Tsahi Levent-Levi在远程通信、VoIP以及3G领域已经有超过15年经验了:他做过工程师、经理、营销人员、CTO。现在,Tsahi Levent-Levi在Amdocs担任商务解决方案总监(the Director of Business Solutions),并负责寻找创新的方式,让电信运营商可以为用户带去更多价值。
----------

WebRTC研究之peerconnection_client与peerconnection_server


关于对于进一步学习WebRTC的一些信息,这里我就简单的讲讲关于怎么生成和测试peerconnection_client和peerconnection_server的简单过程,讲错了大家原谅。
下图红色横线所示,打开webrtc.sln项目之后,能看到peerconnection_client和peerconnection_server这两个子项目的,如果这两个子项目生成了,就肯定有对应的exe文件的。然后就直接可以启动调试。
生成过程
选中其中一个子项目:
选择菜单“生成”,选择“生成 peerconnection_client”或“重新生成 peerconnection_client”,如果不报错的话,peerconnection_client.exe文件就会生成成功的,同理peerconnection_server子项目也可以这样操作。
调试的具体过程是:
选中其中的一个子项目,如下图所示:右击-》调试-》启动新实例
测试peerconnection_server和peerconnection_cilent:
硬件建议:最少2个摄像头,2台联网PC
1、打开peerconnection_server.exe和peerconnection_cilent.exe
如下图所示:左边是server,右边是cilent
注意:cilent要填写正确的server的IP和端口才能连上server
一般server只运行一个就够了,cilent可以有多个,server负责监视连接的cilent(s)的所有信息
连接成功之后,作为client端,可以看到除自己之外所有连接到相同server端的client(s)。比如,现在我有两个client(Temo和Administrator)连接到同一个server,那么client端Temo可以看到Administrator在线信息,List of currently connected peers:就是用来显示所有连接到同一server的client信息的:
双击Administrator就可以进行视频通话了…退出的时候按Esc键退出会话,再次按Esc键退出连接server
下图所示视频会话场景演示:
右下角是自己!且必须双方有可用的摄像头才可以进行视频会话。
Have fun!
--------------

WebRTC is a new front in the long war for an open and unencumbered web. — Brendan Eich, inventor of JavaScript
Imagine a world where your phone, TV and computer could all communicate on a common platform. Imagine it was easy to add video chat to your web application. That’s the vision of WebRTC.
WebRTC implements open standards for real-time, plugin-free video, audio and data communication. The need is real:
  • A lot of web services already use Real-time Communication (RTC), but need downloads, native apps or plugins. These includes Skype, Facebook (which uses Skype) and Google Hangouts (which use the Google Talk plugin).
  • For end users, plugin download, installation and update can be complex, error prone and annoying.
  • For developers, plugins can be difficult to deploy, debug, troubleshoot, test and maintain—and may require licensing and integration of complex, expensive technology. It can be hard to persuade people to install plugins in the first place!
The guiding principles of the WebRTC project are that its APIs should be open source, free, standardised, and more efficient than existing technologies.
Want to try it out? WebRTC is available now in Google Chrome.
A good place to start is the simple video chat application at apprtc.appspot.com. Open the page in Chrome, with PeerConnection enabled on the chrome://flags page, then open the URL again (with query string added) in a new window. There is a walkthrough of the code later in this article.

Quick start

Haven’t got time to read this article, or just want code?
  1. Get an overview of WebRTC from Justin Uberti’s Google I/O video:
  2. If you haven’t used getUserMedia, take a look at the HTML5 Rocks article on the subject, and view the source for Eric Bidelman‘s photobooth demo.
  3. Get to grips with the PeerConnection API by reading through the demo at webrtc-demos.appspot.com, which implements WebRTC on a single web page.
  4. Learn more about how WebRTC uses servers for signalling, NAT traversal and data communication, by reading through the code and the console logs from the video chat demo at apprtc.appspot.com.

A very short history of WebRTC

For many years, RTC components were expensive, complex and needed to be licensed—putting RTC out of the reach of individuals and smaller companies.
Gmail video chat became popular in 2008, and in 2011 Google introduced Hangouts, which use the Google Talk service (as does Gmail). Google bought GIPS, a company which had developed many of the components required for RTC, such as codecs and echo cancellation techniques. Google open sourced the technologies developed by GIPS and engaged with relevant standards bodies, the IETF and W3C, to ensure industry consensus. In May 2011, Ericsson built the first implementation of WebRTC.
Other JavaScript APIs used by WebRTC apps, such as getUserMedia and WebSocket, emerged at the same time. Future integration with APIs such as Web Audio will make WebRTC even more powerful—WebRTC has already shown huge promise when teamed up with technologies such as WebGL.

Where are we now?

WebRTC has been available in the stable build of Google Chrome since version 20. The getUserMedia API is ‘flagless’ in Chrome from version 21: you don’t have to enable MediaStream on the chrome://flags page.
Opera 12 shipped with getUserMedia; further WebRTC implementation is planned for Opera this year. Firefox has WebRTC efforts underway, and has demonstrated a prototype version of PeerConnection. Full getUserMedia support is planned for Firefox 17 on desktop and Android. WebRTC functionality is available in Internet Explorer via Chrome Frame, and Skype (acquired by Microsoft in 2011) is reputedly planning to use WebRTC. Native implementations with WebRTC include WebKitGTK+.
As well as browser vendors, WebRTC has strong support from Cisco, Ericsson and other companies such as Voxeo, who recently announced the Phono jQuery plugin for building WebRTC-enabled web apps with phone functionality and messaging.
A word of warning: be skeptical of reports that a platform ‘supports WebRTC’. Often this actually just means that getUserMedia is supported, but not any of the other RTC components.

My first WebRTC

WebRTC client applications need to do several things:
  • Get streaming audio, video or data.
  • Communicate streaming audio, video or data.
  • Exchange control messages to initiate or close sessions and report errors.
  • Exchange information about media such as resolution and format.
More specifically, WebRTC as implemented uses the following APIs.
  • MediaStream: get access to data streams, such as from the user’s camera and microphone.
  • PeerConnection: audio or video calling, with facilities for encryption and bandwidth management.
  • DataChannel: peer-to-peer communication of generic data.

Crossing the streams

The MediaStream API represents a source of streaming media. Each MediaStream has one or more MediaStreamTracks, each of which corresponds to a synchronised media source. For example, a stream taken from camera and microphone input has synchronised video and audio tracks. (Don’t confuse MediaStream tracks with the <track> element, which is something entirely different.)
The getUserMedia() function can be used to get a LocalMediaStream. This has a labelidentifying the source device (something like ‘FaceTime HD Camera (Built-in)’) as well as audioTracks and videoTracks properties, each of which is a MediaStreamTrackList. In Chrome, the webkitURL.createObjectURL() method converts a LocalMediaStream to a Blob URL which can be set as the src of a video element. (In Opera, the src of the video can be set from the stream itself.)
Currently no browser allows audio data from getUserMedia to be passed to an audio or video element, or to other APIs such as Web Audio. The WebRTC PeerConnection API handles audio as well as video, but audio from getUserMedia is not yet supported in other contexts.
You can try out getUserMedia with the code below, if you have a webcam. Paste the code into the console in Chrome and press return. Permission to use the camera and microphone will be requested in an infobar at the top of the browser window; press the Allow button to proceed. The video stream from the webcam will then be displayed in the video element created by the code, at the bottom of the page.
The intention is eventually to enable a MediaStream for any streaming data source, not just a camera or microphone. This could be extremely useful for gathering and communicating arbitrary real-time data, for example from sensors or other inputs.

Signalling

WebRTC uses PeerConnection to communicate streams of data, but also needs a mechanism to send control messages between peers, a process known as signalling. Signalling methods and protocols are not specified by WebRTC: signalling is not part of the PeerConnection API. Instead, WebRTC app developers can choose whatever messaging protocol they prefer, such as SIP or XMPP, and any appropriate duplex (two-way) communication channel such as WebSocket, or XMLHttpRequest (XHR) in tandem with the Google Channel API.
The apprtc.appspot.com example uses XHR and the Channel API. Silvia Pfeiffer has demonstrated WebRTC signalling via WebSocket and in May 2012 Doubango Telecom open-sourced the sipml5 SIP client, built with WebRTC and WebSocket.
To start a session, WebRTC clients need the following:
  • Local configuration information.
  • Remote configuration information.
  • Remote transport candidates: how to connect to the remote client (IP addresses and ports).
Configuration information is described in the form of a SessionDescription. the structure of which conforms to the Session Description Protocol, SDP. Serialised, an SDP object looks like this:
v=0
o=- 3883943731 1 IN IP4 127.0.0.1
s=
t=0 0
a=group:BUNDLE audio video
m=audio 1 RTP/SAVPF 103 104 0 8 106 105 13 126

// ...

a=ssrc:2223794119 label:H4fjnMzxy3dPIgQ7HxuCTLb4wLLLeRHnFxh810
Signalling proceeds like this:
  1. Caller sends offer.
  2. Callee receives offer.
  3. Callee sends answer.
  4. Caller receives answer.
The SessionDescription sent by the caller is known as an offer, and the response from the callee is an answer. (Note that WebRTC currently only supports one-to-one communication.)
The offer SessionDescription is passed to the caller’s browser via the PeerConnection setLocalDescription() method, and via signalling to the remote peer, whose own PeerConnection object invokes setRemoteDescription() with the offer. This architecture is called JSEP, JavaScript Session Establishment Protocol. (There’s an excellent animation explaining the process of signalling and streaming in Ericsson’s demo video for its first WebRTC implementation.)

JSEP architecture diagram
JSEP architecture


Once the signalling process has completed successfully, data can be streamed directly, peer to peer, between the caller and callee, or via an intermediary server (more about this below). Streaming is the job of PeerConnection.

PeerConnection

Below is a WebRTC architecture diagram. As you will notice, the green parts are complex!

WebRTC architecture diagram
WebRTC architecture (from webrtc.org)


From a JavaScript perspective, the main thing to understand from this diagram is that PeerConnection shields web developers from myriad complexities that lurk beneath. The codecs and protocols used by WebRTC do a huge amount of work to make real-time communication possible, even over unreliable networks:
  • packet loss concealment
  • echo cancellation
  • bandwidth adaptivity
  • dynamic jitter buffering
  • automatic gain control
  • noise reduction and suppression
  • image ‘cleaning’.

PeerConnection sans servers

WebRTC from the PeerConnection point of view is described in the example below. The code is taken from the ‘single page’ WebRTC demo at webrtc-demos.appspot.com, which has local and remote PeerConnection (and local and remote video) on one web page. This doesn’t constitute anything very useful—caller and callee are on the same page—but it does make the workings of the PeerConnection API a little clearer, since the PeerConnection objects on the page can exchange data and messages directly without having to use intermediary servers.
First, a quick explanation of the name webkitPeerConnection00. When PeerConnection using the JSEP architecture was implemented in Chrome (see above), the original pre-JSEP implementation was renamed webkitDeprecatedPeerConnection. This made it possible to keep old demos working with a simple rename. The new JSEP PeerConnection implementation was named webkitPeerConnection00, and as the JSEP draft standard evolves, it might become webkitPeerConnection01, webkitPeerConnection02—and so on—to avoid more breakage. When the dust finally settles, the API name will become PeerConnection.
So, without further ado, here is the process of setting up a call using PeerConnection…

Caller

  1. Create a new PeerConnection and add a stream (for example, from a webcam):
    pc1 = new webkitPeerConnection00(null, iceCallback1);
    // ...
    pc1.addStream(localstream);
  2. Create a local SessionDescription, apply it and initiate a session:
    var offer = pc1.createOffer(null);
    pc1.setLocalDescription(pc1.SDP_OFFER, offer);
    // ...
    pc1.startIce(); // start connection process
  3. (Wait for a response from the callee.)
  4. Receive remote SessionDescription and use it:
    pc1.setRemoteDescription(pc1.SDP_ANSWER, answer);

Callee

  1. (Receive call from caller.)
  2. Create PeerConnection and set remote session description:
    pc2 = new webkitPeerConnection00(null, iceCallback2);
    pc2.onaddstream = gotRemoteStream;
    // ...
    pc2.setRemoteDescription(pc2.SDP_OFFER, offer);
  3. Create local SessionDescription, apply it, and kick off response:
    var answer = pc2.createAnswer(offer.toSdp(),
      {has_audio:true, has_video:true});
    // ...
    pc2.setLocalDescription(pc2.SDP_ANSWER, answer);
    pc2.startIce();
Here’s the whole process (sans logging):
// create the 'sending' PeerConnection
pc1 = new webkitPeerConnection00(null, iceCallback1);
// create the 'receiving' PeerConnection
pc2 = new webkitPeerConnection00(null, iceCallback2);
// set the callback for the receiving PeerConnection to display video
pc2.onaddstream = gotRemoteStream;
// add the local stream for the sending PeerConnection
pc1.addStream(localstream);
// create an offer, with the local stream
var offer = pc1.createOffer(null);
// set the offer for the sending and receiving PeerConnection
pc1.setLocalDescription(pc1.SDP_OFFER, offer);
pc2.setRemoteDescription(pc2.SDP_OFFER, offer);
// create an answer
var answer = pc2.createAnswer(offer.toSdp(), {has_audio:true, has_video:true});
// set it on the sending and receiving PeerConnection
pc2.setLocalDescription(pc2.SDP_ANSWER, answer);
pc1.setRemoteDescription(pc1.SDP_ANSWER, answer);
// start the connection process
pc1.startIce();
pc2.startIce();

PeerConnection plus servers

So… That’s WebRTC on one page in one browser. But what about a real application, with peers on different computers?
In the real world, WebRTC needs servers, however simple, so the following can happen:
  • Users discover each other.
  • Users send their details to each other.
  • Communication survives network glitches.
  • WebRTC client applications communicate data about media such as video format and resolution.
  • WebRTC client applications traverse NAT gateways.
In a nutshell, WebRTC needs two types of server-side functionality:
  • User discovery, communication and signalling.
  • NAT traversal and streaming data communication.
NAT traversal, peer-to-peer networking, and the requirements for building a server app for user discovery and signalling, are beyond the scope of this article. Suffice to say that the STUN protocol and its extension TURN are used by the ICE framework to enable PeerConnection to cope with NAT traversal and other network vagaries.
ICE is a framework for connecting peers, such as two video chat clients. Initially, ICE tries to connect peers directly, with the lowest possible latency, via UDP. In this process, STUN servers have a single task: to enable a peer behind a NAT to find out its public address and port. (Google has a couple of STUN severs, one of which is used in the apprtc.appspot.com example.)

Finding connection candidates
Finding connection candidates


If UDP fails, ICE tries TCP: first HTTP, then HTTPS. If direct connection fails—in particular, because of enterprise NAT traversal and firewalls—ICE uses an intermediary (relay) TURN server. In other words, ICE will first use STUN with UDP to directly connect peers and, if that fails, will fall back to a TURN relay server. The expression ‘finding candidates’ refers to the process of finding network interfaces and ports.

WebRTC data pathways
WebRTC data pathways


To find out more about how set up a server to deal with signalling and user discovery, take a look at the code repository for the apprtc.appspot.com demo, which is at code.google.com/p/webrtc-samples/source/browse/trunk/apprtc/. This uses the Google App Engine Channel API. For information about using a WebSocket server for signalling, check out Silvia Pfeiffer’s WebSocket WebRTC app.

A simple video chat client

A good place to try out WebRTC, complete with signalling and NAT traversal using a STUN server, is the video chat demo at apprtc.appspot.com.
This app is deliberately verbose in its logging: check the console to understand the order of events.
Below we give a detailed walk-through of the code.

What’s going on?

The demo starts by running the initalize() function:
function initialize() {
  console.log("Initializing; room=85444496.");
  card = document.getElementById("card");
  localVideo = document.getElementById("localVideo");
  miniVideo = document.getElementById("miniVideo");
  remoteVideo = document.getElementById("remoteVideo");
  resetStatus();
  openChannel();
  getUserMedia();
}
This code initializes variables for the HTML video elements that will display video streams from the local camera (localVideo) and from the camera on the remote client (remoteVideo). resetStatus() simply sets a status message.
The openChannel() function sets up messaging between WebRTC clients:
function openChannel() {
  console.log("Opening channel.");
  var channel = new goog.appengine.Channel('AHRlWrqwxKQHdOiOaux3JkDQaxmTvdlYgz1wL69DE20mE3Xq0WaxE3zznRLD6_jwIGiRFlAR-En4lAlLHWRKk862_JTGHrdCHaoTuJTCw8l6Cf7ChMWiVjU');
  var handler = {
    'onopen': onChannelOpened,
    'onmessage': onChannelMessage,
    'onerror': onChannelError,
    'onclose': onChannelClosed
  };
  socket = channel.open(handler);
}
For signalling, this demo uses the Google App Engine Channel API, which enables messaging between JavaScript clients without polling. (WebRTC signalling is covered in more detail above).

Architecture of the apprtc video chat application
Architecture of the apprtc video chat application


Establishing a channel with the Channel API works like this:
  1. Client A generates a unique ID.
  2. Client A requests a Channel token from the App Engine app, passing its ID.
  3. App Engine app requests a channel and a token for the client’s ID from the Channel API.
  4. App sends the token to Client A.
  5. Client A opens a socket and listens on the channel set up on the server.

The Google Channel API: establishing a channel
The Google Channel API: establishing a channel


Sending a message works like this:
  1. Client B makes a POST request to the App Engine app with an update.
  2. The App Engine app passes a request to the channel.
  3. The channel carries a message to Client A.
  4. Client A’s onmessage callback is called.

The Google Channel API: sending a message
The Google Channel API: sending a message


Just to reiterate: signalling messages are communicated via whatever mechanism the developer chooses: the signalling mechanism is not specified by WebRTC. The Channel API is used in this demo, but other methods (such as WebSocket) could be used instead.
After the call to openChannel(), the getUserMedia() function called by initialize() checks if the browser supports the getUserMedia API. (Find out more about getUserMedia on HTML5 Rocks.) 
This causes video from the local camera to be displayed in the localVideo element, by creating an object (Blob) URL for the camera’s data stream and then setting that URL as the src for the element. (createObjectURL is used here as a way to get a URI for an ‘in memory’ binary resource, i.e. the LocalDataStream for the video.) The data stream is also set as the value of localStream, which is subsequently made available to the remote user.
At this point, initiator has been set to 1 (and it stays that way until the caller’s session has terminated) so maybeStart() is called:

      console.log("Failed to create PeerConnection, exception: " + e.message);
      alert("Cannot create PeerConnection object; Is the 'PeerConnection' flag enabled in about:flags?");
      return;
    }

    pc.onconnecting = onSessionConnecting;
    pc.onopen = onSessionOpened;
    pc.onaddstream = onRemoteStreamAdded;
    pc.onremovestream = onRemoteStreamRemoved;
  }
The underlying purpose is to set up a connection, using a STUN server, with onIceCandidate() as the callback (see above for an explanation of ICE, STUN and ‘candidate’). Handlers are then set for each of the PeerConnection events: when a session is connecting or open, and when a remote stream is added or removed. In fact, in this example these handlers only log status messages—except for onRemoteStreamAdded(), which sets the source for the remoteVideo element.
Once createPeerConnection() has been invoked in maybeStart(), a call is initiated.
The offer creation process here is similar to the no-signalling example above but, in addition, a message is sent to the remote peer, giving a serialised SessionDescription for the offer. pc.startIce() starts the connection process using the ICE framework (as described above).

Signalling with the Channel API

The onIceCallback() function invoked when the PeerConnection is successfully created in createPeerConnection() sends information about a candidate that has been ‘gathered’.
Outbound messaging, from the client to the server, is done by sendMessage() with an XHR request.
XHR works fine for sending signalling messages from the client to the server, but some mechanism is needed for server–client messaging: this demo uses the Google App Engine Channel API. Messages from the API (i.e. from the App Engine server) are handled by processSignalingMessage().
If the message is an answer from a peer (a response to an offer), PeerConnection sets the remote SessionDescription and communication can begin. If the message is an offer (i.e. a message from the callee) PeerConnection sets the remote SessionDescription, sends an answer to the callee, and starts connection by invoking the PeerConnection startIce()method.
And that’s it! The caller and callee have discovered each other and exchanged information about their capabilities, a call session is initiated, and real-time data communication can begin.

DataChannel

As well as audio and video, WebRTC supports real-time communication for other types of data.
The DataChannel API will enable peer-to-peer exchange of arbitrary data, with low latency and high throughput.
There are many potential use cases for the API, including:
  • Gaming
  • Remote desktop applications
  • Real-time text chat
  • File transfer
  • Decentralized networks
The API has several features to make the most of PeerConnection and enable powerful and flexible peer-to-peer communication:
  • Leveraging of PeerConnection session setup.
  • Multiple simultaneous channels, with prioritization.
  • Reliable and unreliable delivery semantics.
  • Built-in security (DTLS) and congestion control.
  • Ability to use with or without audio or video.

In conclusion

The APIs and standards of WebRTC can democratise and decentralise tools for content creation and communication—for telephony, gaming, video production, music making, news gathering and many other applications.
Technology doesn’t get much more disruptive than this.
We look forward to seeing what inventive developers make of WebRTC as it becomes widely implemented over the next few months. As blogger Phil Edholm put it, ‘Potentially, WebRTC and HTML5 could enable the same transformation for real-time communications that the original browser did for information.’

Learn more

For more information about support for APIs such as getUserMedia, see caniuse.com.
----------

运营商借WebRTC技术 推进IMS视频业务发展


Web的实时通信技术或简称WebRTC(Web Real Time Communication)是最近由Google推出的一项旨在支持网络浏览器进行实时语音对话或视频对话的软件架构。和传统的基于本地客户端或浏览器插件的多媒体通信方式不同,WebRTC通过将多媒体通信所必须的音视频处理(采集、编码、增强)、网络传输、会话控制等核心模块集成到浏览器内部,从而使第三方应用开发者仅需通过简单的JavaScript API调用即可获得实时的音视频通信能力。
对于传统电信运营商而言,WebRTC技术既带来挑战,也意味着机遇。一方面,凭借浏览器的高市场占有率和庞大的用户量,WebRTC技术能够在很大程度上改变现有实时多媒体通信业务的生态环境和游戏规则,对运营商现有的以及未来将要在IMS网络中力推的多媒体实时通信业务产生冲击。另一方面,如果能够实现WebRTC技术与IMS的强强联合,也可以利用WebRTC作为WebApp所天生具有的规模推广、快速部署、维护成本低等优势,将其转化为推进IMS业务开展的重要动力,为IMS用户提供更加丰富的新应用,促使传统用户向IMS网络迁移,从而有效地将IMS能力转化为企业的商业价值。因此,WebRTC技术的的发展也受到运营商、爱立信、思科等设备制造厂商的关注。
WebRTC架构已获多方支持
WebRTC软件架构由两套应用程序调用接口组成:Web API与Native API。
Web API是WebRTC项目提供给第三方多媒体通信应用开发者的一套JavaScript实现的API。为了令WebRTC应用能够“一次开发,随处运行”,互联网标准化组织W3C已经开展了 WebRTC 1.0草案的制定工作,提供一些重要的API接口如Network Stream API与getUserMedia API。Native API是一组根据Web API定义的底层C++接口,二次开发者可以利用JavaScript对其进行封装供给浏览器调用,或者直接用其开发本地程序。因为Native API需要直接与底层的硬件及操作系统进行交互,所以在不同的系统环境,如Windows、Linux、Android中不尽相同。不同浏览器如Chrome、Firefox、Opera等,也会有不同的实现方式。
从具体实现来看,WebRTC向浏览器加入了视频引擎、音频引擎、网络传输及会话控制等新功能模块。其中,音视频引擎模块提供了从音视频采集设备,如麦克风、摄像头,到网络侧音视频处理链的总体框架。为了避免专利纠纷,音视频编码都采用了开源的编码格式,如iLBC、iSAC、VP8等,同时提供相应的抖动缓冲及音视频增强等功能。在网络传输方面,WebRTC使用RTP/SPRT进行媒体流传输,使用ICE(Interactive Connectivity Establishment)技术进行媒体流的私网穿透。WebRTC客户端使用JSEP(Javascript Session Establishment Protocol)协议草案规范WebRTC通信双方应如何交换SDP信息,并进行媒体流协商和控制。JSEP的设计思路将媒体层的控制交由浏览器,而将信令层的控制交由Web应用开发者,从而使得信令状态机可与浏览器彻底分离,保持了协议的灵活性。目前,这些功能已经被集成到Google的Chrome浏览器中,其他浏览器如FireFox、Opera、IE10等也都宣称已经或将会支持WebRTC的主要功能。
WebRTC在IMS网络中的部署
以下介绍一种如何在基于SIP的IMS网络架构中部署WebRTC端到端实时音视频通信应用的组网方案。出于复杂性的考虑,仅考虑了同类WebRTC客户端间的互通,而不涉及与其他SIP终端或PSTN电话间的互通问题。如图所示,WebRTC客户端是以JavaScript编写的,运行于Web浏览器中的Web应用,直接或通过私有网关连接至Internet网络。业务平台需要架设WebRTC代理服务器和STUN(Session Traversal Utilities for NAT)+TURN(Traversal Using Relays around NAT)服务器。SIP服务器则基于IMS核心网的原有配置,不做任何改动。图中的WebRTC客户端皆位于NAT或防火墙之后。在通信过程中,信令流与媒体流分两路进行传输。
1)WebRTC客户端
WebRTC客户端是运行在浏览器中的Web应用程序,采用JavaScript脚本语言编写。其核心部分是一个SIP协议栈,用于发送、接收、解析SIP信令,以及维护SIP信令状态机。本方案中,WebRTC客户端作为WebSocket客户端通过WebSocket接口与WebSocket代理服务器相互连接。WebRTC客户端将SIP消息作为净载荷封装在WebSocket消息中进行传送。
2)WebSocket代理服务器
WebSocket协议属于HTML5标准的一部分,是一种以HTML协议为传输层,用于实现浏览器间双向通信的协议。WebSocket协议兼容于现有HTTP 1.1协议,并通过Upgrade: websocket将协议升级为WebSocket协议。这样做可以充分利用HTTP协议现有的代理、过滤、认证等机制,从而大大降低了协议栈的开发成本。如图所示,通信双方通过WebSocket连接至WebSocket代理服务器。由于WebSocket连接是基于TCP的,因此不存在私网穿透问题。WebSocket代理服务器开启侦听端口,等待WebRTC客户端发起连接。当连接建立后,WebSocket代理服务器接收WebRTC客户端所发送的经过WebSocket封装的SIP消息,抽取出SIP消息后将SIP消息以UDP包形式发送给SIP服务器。SIP服务器将应答以UDP包形式返回给WebSocket代理服务器,WebSocket代理服务器会将SIP消息目的地址替换成实际的目的地址,再重新以WebSocket进行封装,然后发给消息接收方,从而实现客户端之间以及客户端与服务器端间的通信。
3)STUN+TURN服务器
WebRTC的媒体流采用ICE技术进行私网穿透,此功能需要STUN或TURN服务器进行支持。图中给出的是使用TURN的情形,即两个WebRTC客户端皆位于对称NAT之后。此时媒体流必须通过TURN服务器进行中继才能够到达对方。
4)SIP服务器
这里将IMS核心网简单抽象成为一台SIP服务器,实际上其是由CSCF、HSS等多个网元构成,并通过BAC网元与WebSocket服务器相连接。SIP服务器的主要功能是进行用户认证以及通话过程控制,但不负责媒体流的传输。通话双方必须事先在SIP服务器上注册,并周期性发送心跳包保持在线状态。
目前,WebRTC仍处于其技术演进的早期阶段,相关标准文稿也在不断修订,在IMS网络中部署基于WebRTC的音视频实时通信应用的方式也存在许多不完善之处,如没有考虑视频会议等多方通话场景,没有考虑与现有SIP终端、PSTN电话,以及其他类型的WebRTC终端间的互通问题等。但相信随着相关标准和技术的不断完善,将WebRTC技术与IMS的结合将焕发更强大的生命,传统电信运营商如果能够把握未来互联网技术的发展方向,妥善利用WebRTC这一重要机遇,对于推进IMS业务开展、开拓新的业务领域将起到十分积极作用.
---------

Google 收购 GIPS

Google 已经与 Global IP Solution(GIPS) 达成一项收协议, GIPS 是一家总部在 San Francisco 的挪威公司,主要技术是 Voice 和 Video IP 技术

收购价格折算至每股高于今年1月11日 GIPS 股价的142.1%,或5月14日股价的27.5%。

Google 工程总监 Rian Liebenberg 在对外声明中说 GIPS 的技术可以为 Google 提供高质量的语音和视频通讯技术,Google 则欲借此提升其网络开发平台方面的优势。

虽然 Google 并未说明将如何部署或整合 GIPS,但可以料想这项技术在 Google 众多产品如 Google Voice, Gtalk 或 Andorid OS 和 Chrome OS 中会有所应用。

对于 GIPS 现有客户,GIPS CEO Emerick Woods 说,他们将继续为他们提供服务,这些客户包括 AOL, Nortel, Oracle, Samsung, WebEx, and Yahoo.
------------------

Global IP Solutions

Google acquired Global IP Solutions and its real time audio and video products and technology. The GIPS suite of products including: Voice Engine, Video Engine, Voice ConferencingEngine, Video ConferencingEngine, Voice MediationEngine and related components are no longer available for sale to new customers.
GIPS technology is now available for licensing through the WebRTC project.
from http://www.gipscorp.com 
-----------------

谷歌开放实时通信框架WebRTC的源代码


谷歌宣布向开发人员开放 WebRTC 架构的源代码。WebRTC 是一项在浏览器内部进行实时视频和音频通信的技术,正是谷歌一年前收购 Global IP Solutions 公司而获得的一项技术。

开发人员可访问https://webrtc.org页面获取 WebRTC 的源代码、规格说明和工具.
----------------

WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. The WebRTC components have been optimized to best serve this purpose.
Our mission: To enable rich, high-quality RTC applications to be developed for the browser, mobile platforms, and IoT devices, and allow them all to communicate via a common set of protocols.
The WebRTC initiative is a project supported by Google, Mozilla and Opera, amongst others. This page is maintained by the Google Chrome team.
New to WebRTC? Take a look at our codelab.
Lots more resources for getting started are available from webrtc.org/start.
---------------------------

A file transfer tool that can be used in the browser webrtc p2p

send.22333.fun  

Filegogo

send.22333.fun

A file transfer tool that can be used in the browser webrtc p2p

Github Actions Go Report Card GitHub Release License

Demo.gif

Deploy on your self server

Build && Install

make

Run Development

Webapp

npm install

# frontend
# Default Listen port: 3000
# Auto Proxy port: 8080
npm run dev

Server

# Default Listen port: 8080
go run ./main.go server

Client

run cli client. For example:

# send command
go run ./main.go send -s http://localhost:8080/6666 <file>

# recv command
go run ./main.go recv -s http://localhost:8080/6666 <file>

Config

Reference iceServer config

Built-in turn server

# Enable Built-in turn server
[turn]

# if no set, use random user
user = "filegogo:filegogo"

realm = "filegogo"
listen = "0.0.0.0:3478"

# Public ip
# if aws, aliyun
publicIP = "0.0.0.0"
relayMinPort = 49160
relayMaxPort = 49200

iceServer Use Other

For example: coturn

Test Deployment

# Test stun
turnutils_stunclient send.22333.fun

# Test turn
turnutils_uclient -u filegogo -w filegogo send.22333.fun -y

Package Manager Deployment

apt install coturn
# /etc/turnserver.conf

listening-ip={YOUR_IP_ADDRESS}
relay-ip={YOUR_IP_ADDRESS}

# Public ip
# if aws, aliyun
external-ip={YOUR_IP_ADDRESS}

fingerprint
lt-cred-mech
user=filegogo:filegogo
realm=filegogo
from https://github.com/a-wing/filegogo 
-------------------------------------------

为什么-WebRTC-移除BBR拥塞控制算法

WebRTC 在 Bug: webrtc:9883 移除了 BBR 拥塞控制算法,给出的原因也比较简单:


This was introduced on trial but turned out to perform badly for WebRTC
purposes and never used in production.

翻译(gpt-3.5):这个功能在试验中被引入,但在 WebRTC 的实际应用中表现不佳,从未在生产环境中使用过。

关于 BBR 被WebRTC移除国内网上也有一些介绍,大多来源于这两个帖子:

  1. Question about applying BBR on video streaming
  2. Better throughput estimation

里面提到:

However, BBR was deprecated due to some “performance issues” and removed from the codebase.

但国内的人似乎把 performance 翻译成了 性能,容易让人有误解,以为 BBR 在性能消耗性有问题,其实翻译为表现更准确一些。

BBR的表现

那么 BBR 究竟什么表现让 WebRTC 移除了它呢?

帖子 1的作者给出实测数据:

第一张图:红线是由 BBR 给出的带宽测量结果。蓝线是相应的建议编码比特率(由自适应编码模块给出)。绿色类似直方图的线条表示实际的发送带宽。水平虚线是真实的带宽数值。

第二张和第三张图代表了相应的往返时延(RTT)和 pacing 队列大小。它们共同决定了服务器和客户端之间的延迟。

根据实验结果(上图),当 BBR 对带宽进行过高估计时,自适应编码模块会增加编码器的目标比特率。这与大批量传输非常不同,因为过高估计不会增加待发送数据的总量。增加的比特率会导致拥塞。当飞行数据受 CWND 限制时,剩余的数据将积压在 pacing 队列中,导致服务器和客户端之间出现明显的延迟。

也就是说,当 BBR 估测出一个高于实际带宽的值时,作用于编码器,编码处的码率则高于实际带宽,就会造成包在 pacing 中排队, rtt 也相应增大。

如果堆积的包数量超过 pacing 队列的阈值,大量数据包又会发送到网络上,造成拥塞恶化,严重丢包。

why bbr is removed from webrtc 这篇国内文章有探讨这个现象。

对此有谷歌开发者解释:

一个问题是轻微的带宽过估计,就像你观察到的那样。应用最大过滤器会使即使是小的过估计也相当持久。如你的图所示,这个问题在启动阶段(STARTUP)最严重,因为带宽估计受到发送速率的限制,而在启动阶段的发送速率明显高于带宽估计。我们可能在带宽估计器中有一个尚未发现的错误,但在存在ACK聚合的情况下也存在已知的过估计可能性。

在我们的测试中,流程经常处于应用限制状态,这可能会阻止错误的带宽估计从STARTUP中快速过期,但考虑到你的测试中存在一个持续存在的队列,这似乎不是一个因素。

我们观察到的最后一个问题是在WebRTC流的双向使用BBR时,特别是在从STARTUP阶段退出所需的时间较长时,最小往返时间(min_rtt)估计会被夸大。

见:帖子 1

另外一个WebRTC 移除 BBR 原因在帖子 2 有提到:

I don’t think there are any plans for using BBR. One of the problems with BBR for real-time communications is that it alternates between probing the bandwidth and measuring the RTT. To measure the bottleneck bandwidth, we need to build network queues. To accurately measure the RTT, all network queues must be completely drained which requires sending almost nothing for at least one RTT. These variations in target send rate do not play nice with encoders and jitter buffers.

我认为目前没有使用 BBR 的计划。BBR 在实时通信中的一个问题是它在探测带宽和测量往返时延(RTT)之间交替。要测量瓶颈带宽,我们需要建立网络队列。要准确测量 RTT,所有网络队列都必须完全排空,这要求至少在一个 RTT 内几乎不发送任何数据。这种目标发送速率的变化对编码器和抖动缓冲器来说并不友好。

也就是说 BBR 和 WebRTC 的编码器以及Jitter Buffer 配合不好

思考

从现有资料看(@PixPark 没有实测),BBR 在带宽估计上要比 GCC 准确,在收敛性上也比 GCC 迅速,这是 BBR 的优点。WebRTC 自从引入 BBR 依赖,网络上关于它的讨论很多,国内好多RTC 开发者对BBR 寄予厚望。

既然 BBR 属于实验性的功能,实际和 WebRTC 配合表现又不是太好,那是不是 BBR 就不能在 RTC 中应用了?

个人认为如果是对带宽准确性要求较高的场景,比如说在服务端对 SVC 编码层转发丢弃的场景,可以利用 BBR 带宽准确的特点,比较准确的对下行进行转发数据包。

另外就是根据自己的要求对 BBR 进行调参,调优,比如按照探测带宽的 80% 给编码器设置码率(可能是个馊主意)。

还有一点疑问是在带宽变动比较大的场景,如楼道走动等,BBR 的适应性怎么样?需要进一步探索

本文对 BBR 的讨论并不深入,并且带了一些疑问,主要是让大家对 WebRTC 移除BBR 有个了解。对这块比较了解的朋友,希望分享你的观点和看法。

参考

整理了一些关于 BBR 和网络相关的参考文章:

why bbr is removed from webrtc
结果令人失望的 bbr2
真实网络中的 bbr
揭露 bbr 的真相
长肥管道:为何文件传输速度这么慢
网络协议–TCP的未来和性能

from https://pixpark.net/973d8371.html

 

 

 
 
 

 

 

 

No comments:

Post a Comment