You are on page 1of 4

Digital Video

Introduction to Digital Video


Video is the electonic signal or data that, when rendered through a display, appears as moving
images. Video is captured by a video camera whose sophistication ranges from those used in a
studio to cameras in a mobile phone. The principle of operation is the same; an image is captured
as light from the scene passes through a lens to be brought in focus on the imaging device.
Typically the imaging device is made up of thousands of individual sensors each capturing one
pixel of the image. In order to capture colour images the light hitting the sensor is split into Red,
Green and Blue (RGB), using suitable filters, with each component having its own sensor i.e. each
pixel effectively has three sensors. Any colour can be represented as a combination of red, green
and blue. When the sensors are exposed to light from the scene, by opening a shutter, the
intensity is recorded as an analogue electronic voltage. The amount of light can be adjusted by
changing the aperture size (how big the hole is that lets light in from the outside world) and/or the
shutter speed (how long the hole is open for). In low light conditions a large aperture and/or slow
shutter speed will be necessary. The image, captured as an analogue voltage with RGB values for
each pixel, is converted to a digital one by digitizing the analogue voltages for each pixel using an
analogue-to-digital converter; typically using eight bits for each value.
Up to this point the basic process is the same for (digital) still camera and video camera. The big
difference comes because the video camera has to take tens of pictures per second to capture
anything moving in the scene so that it will look like natural movement when played back on the
display device. For television 25 (30 in the US system) pictures-per-second has been traditionally
used which is similar to the 24 used in cinema. For some low-end devices 12 pictures per second
is used. When filming fast moving scenes, such as sports, rates up to 60 pictures-per-second may
be used.
The video in standard definition television using the PAL system adopted in the UK and much of
Europe has a resolution of 720 x 576 pixels matching the resolution of the original analogue
system. Given this, the video bit-rate being generated by the camera is:
720x576 pixels x 3 colours x 8 bits x 25 pictures per second = 248.8Mbit/s. That's without any
audio or extra bits that we'll need to indicate the beginning and end of a picture.
On a camera we need to store the video using whatever media is available; tape, DVD, memory
card, hard-drive. As an example; with the bit-rate above a 16GByte memorycard would be full in
about eight and a half minutes -no feature length movies here.
What about if we want to communicate the video as television for example. Here we need to know
that our TV transmission systems such as terrestrial, satellite and cable are limited to 50Mbit/s at
most (assuming we only need to carry one program) to see that we have a problem. What about
transmitting it using broadband? Even with an exceptionally fast connection we still won't have
enough bits-per-second to carry our video.
Of course we know that video is transmitted via satellite, terrestrial and broadband very
successfully, so what happens to enable this? At first glance we can see that reducing the
resolution (that is reducing the number of pixels) will lower the bit-rate (half the resolution =>half
the bit-rate). We could also reduce the picture rate (e.g. use 10 frames per second instead of 25)
or we could reduce the number of bits used in our analogue-to-digital converter. However all of
these would affect what we would see on the screen and how we perceive the video quality. We
need to reduce the bit-rate somehow without us noticing when we come to display the video.
Fortunately there are techniques that can be used which we notice less when it comes to the
viewed video. They rely on how we perceive the video image.
Communication Technology Notes R Germon 2010 1
Digital Video
The first trick is to notice that we are less able to see detail in colour than we are brightness. If we
have a series of lines where just the colour changes between them (no brightness change) the
lines need to be further apart than if they are black and white (just brightness change) in order to
see distinct lines rather than a continous colour (or grey). In more technical terms we can see
greater resolution in luminance (brightness) than we can colour. To make use of this in our
treatment of video the RGB values can be transformed into YC
R
C
B
or Luma, Chroma Red and
Chroma Blue
*
. The Luma value carries the black and whiteness or brightness (luminance) and the
Chroma values carry the colour information (this is just a transformation of the values we can
always transform them back; and we do when it comes to display the video). The advantage of this
transformation is that we can now treat the Luma and Chroma values separately. We find that
provided we keep the same number of pixels showing black and whiteness (the Luma values) we
can have fewer pixels showing the coloured (Chroma) part. Typically we can get away with half as
many chroma pixels as luma both horizontally and vertically and not notice the difference when we
convert back to a full set of RGB values that a display will require. This way we can half the
number of bits representing each picture, and therefore the bit-rate, without really noticing.
Another step we can take is to loose spatial detail in each picture that we don't really notice. It
turns out that to resolve fine detail we need high contrast; so fine detail that doesn't have large
luma or chroma changes can be lost without us noticing. This is the principle of JPEG still image
compression that is also used in video. Teasing out the fine detail involves transforming our pixel
values into a form that identifies the fine detail. Fortunately there is a mathematical tool that does
just this; the Discrete Cosine Transform. Once transformed, approximations are made to the values
representing fine detail, which often turn out to be zero which can then be coded very efficiently.
Choosing how close the approximations are is set when the degree of compression or video quality
is set. We can typically half the number of bits per picture and the bit-rate using these techniques
without noticeable degradation in picture quality.
Video picture sequences are usually made up of consecutive pictures that have a lot of similarity.
Only a small amount of the image changes from one picture to the next. Parts of the image that
have moved will typically have the same pixel values as the corresponding pixels in the previous
and following pictures; just in a different position in the image. Where blocks of pixels are the same
it is not necessary to store or transmit the values again instead a position vector pointing to the
reference pixels can be coded. Typically blocks of 16x16 pixels are treated in this way giving the
potential for enormous bit savings. This yields three types of pictures in the compressed video:
Intra pictures (Keyframes in editing) which have no reference to adjacent pictures where just the
still image techniques described above are used, Predicted pictures that use past pictures as
reference and Birectionally predicted pictures that use past and future pictures as reference.
Once all this has been done there is still scope for further reduction of bit-rate by noting that some
bit sequences are more likely than others and coding accordingly.
These techniques combined enable up to a 100-fold reduction in bit-rate whilst maintaining a
broadcast quality image.
This discussion started with a standard definition picture. The trend to have bigger TVs and watch
video up close (e.g. on a monitor) has meant there is a drive towards higher resolution images
(commonly known as High Definition) and even consumer camcorders and phones now have HD
recording capability. So-called full-HD in widescreen format has 1920x1080 pixels; five times the
standard definition. This drives the need for video compression even harder. Actual
* This trick was actually used with analogue TV transmissions to allow colour transmissions simultaneously with the
original monochrome. The Y component was used in monochrome receivers. Colour receivers could also decode the
low bandwidth chroma signals.
Communication Technology Notes R Germon 2010 2
Digital Video
implementations typically employ refinements of the principles discussed above, for example in
MPEG 4. Until recently, mobile phone screen resolution was a maximum of about 480x320 pixels
(e.g. Iphone 2 and 3), with most being somewhat below this, gives a pixel count nearly three times
lower than standard definition. In terms of video delivery this works in our favour.
So through compression we can reduce the bit-rate required to store or transmit video to a level
that is commensarate with that available from existing communication channels and the capacity of
storage media. On the other-hand it does make things substantially more complicated. Another
consequence is that we now require a very reliable (virtually error free) transmission or storage
channel to prevent the bare-bones video data from being corrupted and hence the displayed image
distorted.
Whilst compression techniques used for broadcast TV are standardized (so that all TV/ set-top-box
manufacturers can build a working decoder, they actually use MPEG 2) that is not the case when it
comes to computer based video and IP based distribution.
In the world of the PC there are a number of different compression techniques (codecs) and an
even greater number of ways of wrapping up the compressed data and mixing it (multiplexing) with
other streams such as audio and textual information; the so-called container format. Fortunately
most media players are able to handle the most common of these.
The Internet and IP networks generally are relatively new medium for distributing video and are
significantly different to the broadcast transmission systems traditionally used for TV. In the latter,
dedicated, fixed bandwidth is available for a particular channel and the image quality is usually
very high. The programs themselves are usually of a high production quality and censorship of the
content implicit. Internet video can have greatly varying quality in terms of both image and
production.
Video can be delivered over IP networks in three main ways:
Download and Play: Here a video file must be downloaded in its entirity before it can be played. A
copy of the video file is stored locally to the media player. The video quality available is dependent
only on how the video was captured and encoded to create the file.
Progressive Download: Here the video file is opened by the media player whilst it it still being
downloaded. However the play-rate (this is fixed by the compressed video bit-rate) and download
rate (what is available from the communication path) are independent. The file on the server has to
exist before this can happen so is not suitable for streaming a live source.
Live Streaming: Here the video is transmitted at the same rate as it is renderred; there is no local
storage (apart from a small amount of buffering). The communication bandwidth must be at least
as big as the video bit-rate at all times. This technique usually requires special protocols and is
necessary if the video stream is from a live feed
*
.
Whereas in broadcast video all receivers receive the same signal (i.e. a single video stream is
transmitted to all recievers) an IP network is usually used in a unicast fashion one stream per
*Apple has recently introduced an alternative protocol called http live streaming that is required to stream
video to Iphones. In this a file or stream (in MPEG-TS format) is broken up into small files by a segmenter.
An index file keeps track of which files have been received and rendered so that the next file can be
requested. Streams of different quality may be available which cna be chosen depending on available
bandwidth.
Apple reference:
http://developer.apple.com/library/ios/#documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/
Introduction/Introduction.html#//apple_ref/doc/uid/TP40008332-CH1-DontLinkElementID_29
Communication Technology Notes R Germon 2010 3
Digital Video
receiver even if the stream is the same video. The network bandwidth required therefore increases
at the same rate as the number of receivers. One way round this is to use multicasting. This is
where a packet is only duplicated when necessary in the multicast supported network (the routers
have to support multicasting). Alternatively, a content delivery network may be used where the
video content is duplicated at the edges of the network close to the end-user.

*
Communication Technology Notes R Germon 2010 4

You might also like