How Does Video Conferencing Work?

You’ll need to know about video conferencing system components, data compression, data transfer, video conferencing standards and types of video conferencing to fully appreciate how video conferencing actually works.

What are the System Components?

The core of a video conferencing system consists of elements that enable the capture and transfer of video images and audio sounds. These elements are:

  • Video input – 2 or more video cameras or web cams; possibly digital projectors / whiteboards.
  • Audio input – microphones either centrally located or on individuals.
  • Video output – monitor, computer screen, television and/or projector.
  • Audio output – professional speakers, headphones or laptop computer speakers.
  • Codec – hardware or software-based coder-decoder technology that compresses analog video and audio data into digital packets and decompresses the data on the receiving end.
  • Echo cancellation software – diminishes audio delays to enable real-time conversation.
  • Network for data transfer – today most video conferencing is transmitted over a high-speed broadband Internet connection, using similar technology as VoIP (Voice over Internet Protocol) but LAN and occasionally ISDN connections are used.

How Does Data Compression Work?

The camera and microphone capture analog video and audio signals from a video conference. These data are a continuous wave of amplitudes and frequencies representing sounds, color shades, depth and brightness.

Enormous bandwidth would be required to transmit this data without compression, so codecs (hardware/software technology) compress and decompress the data into digital packets.

How Does the Data Transfer Work?

Once digitally compressed, the video and audio data can be transmitted over a digital network.

In most cases, a broadband Internet connection is the preferred network.

Data is sent to the other participant’s video conferencing system and then decompressed and translated back into analog video images and audio sounds.

What About Getting Through Firewalls?

Firewalls, designed to protect businesses from viruses and to provide security, can block the transmission of video conferencing data. To support video conferencing, the firewall needs to:

  • recognize video conferencing signals
  • bypass the firewall (or router) without disabling firewall protection for other traffic
  • handle substantial traffic to ensure high-quality video conferencing

Session Border Controllers (SBCs), generally a combination of hardware and software, are the standard equipment for getting video conference calls through a firewall.

What Role Do Standards Play?

Media Standards

Of course, video conferencing is only possible when the audio and video information is translated and transmitted using the same technology language or standards. For video, the codec system (coder-decoder technology to compress and decompress data) uses the H.264 standard at conferencing locations.

The standard for video compression, H.264, is widely used in various applications/devices such as video conferencing, Blu-ray DVD players, iPods, and YouTube.


Over a decade ago, The International Telecommunications Union (ITU) developed the H.323 video conferencing standards and protocols to ensure compliance and to facilitate support across networks.

The majority of the installed base for video conferencing equipment in 2009 is H.323 but Session Initiation Protocol (SIP) is rapidly being adopted as the standard for video because it can work between many different forms of communication such as voice, data, instant messaging, and Web 2.0-based applications.

How Do Different Types of Video Conferencing Work?

Video conferencing is either point-to-point for participants in 2 different locations or multi-point for 3 or more locations.

Point-to-Point Video Conferencing

Point to point video conferencing connects two different points anywhere, whether an office in San Francisco or a conference room in Singapore.

Multi-Point Video Conferencing

Video conferences to more than three locations can be either centralized or decentralized.

Centralized Multi-Point Video Conferencing

To execute a multi-point conference among three or more remote locations, in some cases a software or hardware bridge interconnects the endpoints, similar to an audio conference call. A multi-point bridge, multi-point control unit or multipoint conferencing unit (MCU), either on a remote server on embedded in the video conferencing system, ties the locations together. Here’s how it works:

  • All audio and video data flow through the MCU’s “central processing center”.
  • The MCU then sends the information out to each location.
  • Audio is transmitted and received simultaneously to all locations in full-duplex mode (everyone can talk and hear at the same time as with a live, in-person conversation).
  • Video is broadcast differently, depending upon the software and system complexity.

Decentralized Multi-Point Video Conferencing

Some video conferencing systems are capable of multi-point conferencing without any MCU.

Decentralized multi-point video conferencing, based on the H.323 standard, lets each location exchange video and audio directly with other locations.

This approach can afford higher quality video and quality due to absence of a gatekeeper, as well as greater convenience (participants can make ad-hoc multi-point calls regardless of MCU availability). On the other hand, it requires increased network bandwidth, since every station transmits to every other station directly.