HD Voice Resources
Learn more about HD Voice with these educational resources.
The word codec comes from mashing together the functions of compressing (co) and decompressing (dec) analog sound into digital bits for use by computers and networks. There are literally hundreds of audio codecs -- pieces of computer code -- available today and embedded in any device that plays sound, from a simple MP3 player to the hottest smart phones. Some are open source and free while others are proprietary and/or patented, requiring licensing fees.
Why are there so many different codecs? Over the years, people have created and optimized codecs for the specific environments they were going to be used in, so the cellular community built codecs that optimized the use of radio frequency (RF) bandwidth while others wanted adaptable bit-rate codecs suitable for a wired broadband environment that would adjust sound quality depending on how much bandwidth was available -- compress a little if there's a lot of bandwidth, crunch harder if there's less.
More recently, developers have been leveraging more efficient computer processors to develop better codecs. The tradeoff for using more CPU cycles is, of course, more power required to run them -- not an issue at a desktop, but definitely a concern for mobile devices.
A number of codecs are ITU (International Telecommunications Union) standards, formalized for international use and incorporation into devices. If a codec name starts with a G and a period, such as G.711 or G.722, it's an ITU standard.
You can't talk about HD voice codecs without first talking about baseline analog and digital voice quality. Established way back in 1972, G.711 is the standard for stock VoIP voice quality and equal to what you get out of a POTS analog phone call. It captures speech in a range of 3.4 kHz, has a sampling rate of 8 kHz, and needs 64 kbit/s of bandwidth to deliver a call.
G.722 is Old School when it comes to HD voice, formalized back in 1988. It captures sound in a range of 7 kHz and samples audio at a rate of 16 kHz -- double that of G.711. The result is superior quality and clarity far above a POTS analog phone call. Taking advantage of CPU processing speeds, G.722 can deliver double the quality of a G.711 phone session in the same amount of bandwidth -- 64 kbit/s.
You'll find G.722 built into pretty much every desktop VoIP handset built today (2010), regardless of manufacturer or model of phone -- yes, even the modest-looking $129 list price entry models support G.729. Patents on G.722 have expired so there's no licensing fees and the processing requirements are minimal on today's chips. At least one software shop (D2 Technologies) has implemented G.722 for the Android mobile operating system. Handset manufacturers who support G.722 include Aastra, ADTRAN, Allworx, AudioCodes, Avaya, Cisco, Panasonic, Polycom, Siemens and Snom .
Coming strong out of Europe and the mobile community is AMR-WB, also known as G.722.2. Mobile operators wanted better sound quality delivered in less bandwidth, so AMR-WB should deliver quality G.722 quality at around 24 kbit/s. France Telecom and Ericsson have been leaders in promoting AMR-WB for mobile HD voice -- in part, because they hold some of the patents in the standard -- and they would like to see AMR-WB appear in desktop phones and software clients so users can make end-to-end calls in AMR-WB, rather than having to translate (transcode) between G.722 and AMR-WB. You'll see more AMR-WB buzz for desktop handsets later in 2010 and into 2011.
SILK is Skype's "super wideband" voice codec. Optimized for real-time communications on the Internet, SILK is an adaptive bit-rate codec that supports multiple sampling rates ranging from 8 kHz narrowband to 24 kHz or more. If you have the CPU cycles and bandwidth of 40 Kbp/s, SILK gives you the best performance possible. On a lower-powered machine and/or with less available bandwidth, SILK drops down and adjusts to the conditions involved. Unlike AMR-WB, SILK is available royalty-free. A few manufacturers, including AudioCodes, have discussed incorporating SILK into their products.
Finally, Global IP Solution (GIPS) offers a proprietary wideband speech codec that has been incorporated into a large number of soft clients and applications, including AIM, Citrix Online, CommuniGate, Gizmo5, Google Talk, IBM Lotus, NimBuzz, QQ, WebEx, and Yahoo!
In order to have a successful HD voice call, both (or nearly all in a conference) need to use the same codec. If both sides are using different HD codecs either one side has to be transcoded -- translated -- into the same codec type or both sides have to shift to a mutually agreeable codec.
Transcoding already takes place in the VoIP world on a daily basis, with calls being compressed before sent out long distance and translations taking place between the POTS network and VoIP transport. The issues with transcoding between HD codecs are that it takes more horsepower (processing cycles) than with vanilla VoIP/POTS networks and nobody is willing to say the end translation product is as good as a "pure" end-to-end HD voice call using a single codec.
If both sides can't find a mutually agreeable HD voice codec, they end up dropping down to the lowest common denominator -- G.711 -- which kills the primary point of using HD in the first place.