Much has been written about the relative merits of Adobe Flash vs. Javascript for building rich internet applications, but there is one area where Flash is uniquely capable, which is the ability to send and receive audio and video without requiring any download, plug-in, or software installation.
Most people know Flash as the technology behind those often annoying animated intros on many Web sites. Anyone who has right-clicked on a video in Youtube, Comedy Central or CNN, also knows Flash as the ubiquitous video player which has all but replaced Apple Quicktime, Microsoft' Windows Media Player, and RealNetworks RealPlayer for playing video from Web page. What most people don't know is that the same Flash Player also has built-in codecs to allow sending audio as well as receiving it. This is the technology used in Adobe Acrobat Connect, Convoq ASAP, AOL Userplane, and many others to enable two-way and multi-point audio and video conferencing.
Adobe has a very clever monetization strategy. They give away the Flash Player, which has made it ubiquitous, but in order to enable two way a/v one must purchase the Flash Media Server (FMS). FMS comes in three flavors:
- Streaming ($995)
- Interactive ($4500)
- Developer (free)
The Streaming edition allows the Flash Player on the client to connect to either of two Adobe-provided streaming applications on the server. One of those applications allows streaming of pre-recorded video files and the other streams live video from the free Flash Media Encoder 2 or another Flash Player. The Encoder can encode video using the On2 VP6 codec and audio using MP3. The Flash player is restricted to the older Sorenson Squeeze video and the Nellymoser Asao audio codecs. The latter is optimized for 8 kHz sampling, which makes it very efficient for voice but not so good for music. What all this means is that a developer can set up a Flash Media Streaming Server and start streaming video right away with one of the applications that comes in the box, or use Adobe Flex to play the same video embedded in an application. WIth the Flash Media Interactive Server, the developer can write applications which run on the server that authenticate connection requests, communicate with any other back-end logic by sending and receiving XML documents, and provide the plumbing to connect audio and video from one end-user to another.
In addition to streaming audio and video, and Flash Media Server provides a number of other conveniences:
- Shared Objects - a mechanism for synchronizing data among multiple clients and servers. Whenever an application makes a change in one place, it is automatically available everywhere else - even across a network.
- Robust Connections - the Flash Player prefers to connect to the Flash Media Server using Adobe's proprietary RTMP protocol over port 1935, but it contains built in code to try RTMP over ports 80 and 443, to tunnel RTMP over HTTP on port 80, and to send it via SSL over port 443. This mechanism can save you a lot of work if your users are behind a corporate firewall or even a proxy server doing stateful packet inspection. This is very popular with end users but may earn some scowls from their IT security people who are paid to keep those same end users from trying new software.
- True cross-browser and cross-platform compatibility. Remember Write Once, Run Anywhere? Sun registered it as a trademark in 2001, but Flash is the only platform that delivers that promise for audio and video.
- High performance. Since FMS doesn't do any processing of the streams but merely provides the plumbing, a single server can handle thousands of simultaneous streams.
In spite of all the above mentioned advantages, Flash is not the perfect solution. In order to keep the Flash player small and the Flash Media Server throughput high, Adobe made a number of compromises that leave it at a disadvantage over more specialized solutions such as Skype:
- Audio and video are carried over TCP and not UDP. TCP is designed to reliably deliver high volumes of data traveling in one direction, such as file transfers. It also works reasonably well for web browsing since the user spends more time reading the page than interacting with the server. It evens works just fine for audio and video if you have a high quality network connection. Thus, you can put together a quick demo of videoconferencing application and show it to your management, customers, or investors in your conference room. When you start shipping to customers whose "Broadband Internet" is a flaky DSL connection or an overloaded cable modem you'll get complaints that the audio drops out or gets delayed, sometimes for tens of seconds. What's happening is that the connection is dropping packets. When they eventually get resent they arrive all at once, resulting in silence followed by delays. This is the reason Skype and various SIP softphones use UDP and do their own real-time flow control. Fortunately, Adobe provides a lot of tools for monitoring and tweaking the buffering and bandwidth, but using them requires expertise and it will never be as good as Skype.
- No packet-loss makeup. One of the things that makes Skype sound so good is the software they licensed from GIPS to fill in for missing audio packets. They figured out if an audio packet gets lost, it's better to throw it away and make up some plausible sound than to have the speaker go silent and then have the sound show up later.
- No acoustic echo cancellation. Windows has some really excellent built-in signal processing on the microphone input that subtract the signal previously sent to the speaker. WIthout this, you speak into the microphone, your voice comes out of the speaker at the far end, it goes into the microphone at the far end and gets sent back to your ear. Since the delay can be anywhere from 100 msec to several seconds, it can be very distracting. Adobe chose not to take advantage of this feature of Windows, presumable for cross-platform compatibility. Instead, they offer a simpler solution which merely cuts the microphone gain when there is output present at the speaker. The result is that you really need to encourage your users to wear a headset.
- No processing at the server. This approach makes for a very efficient server, but at the expense of sending lots of data to the client. Each audio stream is sent unaltered to all of the subscribers to that stream. What this means is that if you have five people in a meeting, and they all have open microphones, each five audio stream go up to the server and twenty come down, four streams to each of the five participants. This is not as bad as it sounds, since for usability reasons you don't really want more than five or so people talking at once, but you need to account for the bandwidth.
- Privacy and Security. The good news is that by being very careful about privacy and security, Adobe has earned a position of trust whereby most users will install the Flash Player and run Flash applications without worry. The bad news is that Flash applications have only limited access to the local file system and constrained access to devices. One particularly annoying feature is the "privacy dialog" which pops up and asks the user if he or she really wants to let your application use the microphone or camera. Not a big deal, but it can be confusing and is a reminder to your users that this is not a native application.
- No access to the stream. Adobe's a/v model is very simple. You publish a stream by sending a camera, microphone or a file to it. You subscribe to a stream by directing it to the screen or a speaker. The Flash Media Interactive Server does provide a method for recording streams, but there is no way to get your hands on the actual data.
In summary, the Flash Player and Flash Media Server provide a way to build the most advanced cross-platform applications that can be delivered without requiring the user to download or install anything. You can build quite robust and scalable applications for playing stored media and broadcasting live streams. Interactive applications that depend on low-latency audio will not perform as well as their native counterparts but can be done if you are attentive to the details and aware of the limitations.