You are on page 1of 110

Laxmi Devi Institute of Engineering &Technology Chikani, Alwar

IMPORTANT QUESTION BRANCH-IT Year: Final (4th) SUBJECT: Multimedia System SEM: VII (vi)Data compression is divided into how many parts: a. 4 b 2 (vii) e. MP3 (viii) c. 3 d. 5 b. JPEG c. H.261 Which is not Lossy method? e. MPEG Image format is / are?

Q.1 Answer following multiple choice questions (0.5*10 = 5)

a. Entropy Encoding

(i) Which of the following is NOT a video file extension: a. MP4 b AVI c. QT d. JPG e. MOV (ii) What are the hardware devices in multimedia system? a. Connecting devices b. I/O devices c. Storage device d. Modem and ISDN (iii) system ? a. Card or Page based tool b.Icon based and Time based tool c. VB and Authoware , professional tool e. All of the above (iv)Which data format unique number is used for the Violin and flute? a. 40 and 12 b. 40 and 73 c. 0 and 12 d. 0 and 73 (v) The product is revised, based on the continuous feedback received from the client, is considering in which phase. a. Planning and costing b. Designing and producing c. Testing d. Delivering e. All of the above What are authoring tools supported by multimedia

a. Capture Image b. Stored c. both a and b d. Graphics (ix)How many steps in the Image recognition? a. 4 b. 5 c. 7 d. 6 (x) Multimedia Building blocks are? a. text b. audio c. sound d. D Animation e All of the above
(xi) With reference to multimedia elements, of the following, pick the odd-one out:

a. Graphics b Animation c. Audio


d. Video (xii) medium? a. Presentation and Representation b. Perception c. Encoding d. storage e. Information Exchange Which is not belongs to data stream characteristics for e. Voice script Which is NOT the criterion for the classification of

(xiii)

continuous media?

a. Strongly periodic
c. Weakly Periodic (xiv)

b. Weakly regular d. Aperiodic

Multimedia is the presentation of a (usually interactive) computer application, incorporating media elements such as text, graphics, video, animation and sound on computer

Which reception mode is used for monitors all the Advantages and disadvantages of Multimedia: b. omni on / poly d. omni off / poly Advantages of multimedia Increases learning effectiveness. Is more appealing over traditional, lecture-based learning methods. Offers significant potential in improving personal communications, education and training efforts. Reduces training costs. Is easy to use. Tailors information to the individual. Provides high-quality video images & audio. Offers system portability. Frees the teacher from routine tasks. Gathers information about the study results of the student. Disadvantages of multimedia: Expensive Not always easy to configure Requires special hardware Not always compatible

MIDI channels and responds to all channel messages and plays several notes at time? a. omni on / mono c. omni off / mono

(xv)

A plan that outlines the required multimedia expertise

is prepared, is considering in which phase. a. Planning and costing b. Designing and producing b. Testing d. Delivering

Q.2 (i) What do you mean by multimedia and give advantages and disadvantages of multimedia? Sol. What is Multimedia? Multimedia = Multi + media Multi = many Media = medium or means by which information is stored, transmitted, presented or perceived. By simple definition: Multimedia can be any combination of text, graphics, sound, animation and video, to effectively communicate ideas to users Multimedia is any combination of text, graphic art, sound, animation and video delivered to you by computer or other electronic means.

(ii) Give some application of multimedia in different areas with example. Sol. Multimedia- Applications Multimedia plays major role in following areas Instruction

Business Advertisements Training materials Presentations Customer support services Entertainment Interactive Games Enabling Technology Accessibility to web based materials Teaching-learning disabled children & adults

Perception

Presentation

Representation

Fine Arts & Humanities Museum tours Art exhibitions Presentations of literature In Medicine In training Public awareness campaign Q.3 (i) Give proper definition of Medium. Explain criteria for the classifications of medium with examples. Sol. Medium Means for distribution and presentation of information Classification based on perception (text, audio, video) is appropriate for defining multimedia

Criteria for the classification of medium

Storage

Transmission

Information Exchange
(ii) Explain in brief different categories of multimedia hardwares.

Multimedia Hardware
The hardware required for multimedia can be classified into five. They are 1. Connecting Devices SCSI,USB,MCI, IDE, USB 2. Input devices an Output devices An input device is a hardware mechanism that transforms information in the external world for consumption by a computer. An output device is a hardware used to communicate the result of data processing carried out by the user or CPU. 4. Storage devices and

Random Access Memory (RAM),Read-Only Memory (ROM) Floppy and Hard Disks ,Zip, jaz, SyQuest, and Optical storage devices ,Digital versatile disc (DVD) ,CD Recorders

(ii) Explain in brief different categories of multimedia softwares. Sol. Multimedia Software Familiar Tools Word Processors _ Microsoft Word,_ WordPerfect Spreadsheets _ Excel Databases _Q+E Database/VB, Presentation Tools, _ PowerPoint

5. Communicating devices. Modems ,ISDN, Cable Modems


Q.4 (i) What do you mean by multimedia system? Explain main properties of a Multimedia system with examples. Sol. Multimedia System

Following the dictionary definitions, Multimedia system is any


system that supports more than a single kind of media Implies any system processing text and image will be a multimedia system!!! Note, the definition is quantitative. A qualitative definition would be more appropriate. The kind of media supported should be considered, rather the number of media A multimedia system is characterized by computer-controlled, integrated production, manipulation, storage and communication of independent information, which is encoded at least through a continuous (time-dependent) and a discrete (time-independent) medium. Main properties of Multimedia system 1. Combination of, media 2. Media Independence 3. Computer support integration 4. communication sytem

Multimedia Authoring Tools A multimedia authoring tool is a program that helps you write multimedia applications. A multimedia authoring tool enables you to create a final application merely by linking together objects, such as a paragraph of text, an illustration, or a song. They are used exclusively for applications that present a mixture of textual, graphical, and audio data. Card- or Page-based Tools Icon-based Tools Time-based Tools

Elemental Tools Elemental tools help us work with the important basic elements of your project: its graphics, images, sound, text and moving pictures. Elemental tools includes: Q.5 (i) What is the significance of Data stream in Multimedia? Give classification of data stream with examples.

Sol. Data streams

Data Stream is any sequence of individual packets transmitted


in a time-dependent fashion Packets can carry information of either continuous or discrete media Transmission modes Asynchronous Packets can reach receiver as fast as possible. Suited for discrete media Additional time constraints must be imposed for continuous media Synchronous Defines maximum end-to-end delay Packets can be received at an arbitrarily earlier time For retrieving uncompressed video at data rate 140Mbits/s & maximal end-to-end delay 1 second the receiver should have temporary storage 17.5 Mbytes Isochronous Defines maximum & minimum end-to-end delay Storage requirements at the receiver reduces (ii) Explain Data Stream characteristics of Continuous media. Give example of each. Sol. Characterizing continuous media streams Periodicity Strongly periodic PCM coded speech in telephony behaves this way. Weakly Periodic Aperiodic Cooperative applications with shared windows Variation of consecutive packet size Strongly Regular Uncompressed digital data transmission

Weakly regular Mpeg standard frame size I:P:B is 10:1:2 Irregular Contiguous packets Are contiguous packets are transmitted directly one after another/ any delay? Can be seen as the utilization of resource Connected / unconnected data streams Q.6 (i) Explain Multimedia Building blocks in details. Sol.

Multimedia Building Blocks


Any multimedia application consists any or all of the following components : 1. Text : Text and symbols are very important for communication in any medium. With the recent explosion of the Internet and World Wide Web, text has become more the important than ever. Web is HTML (Hyper text Markup language) originally designed to display simple text documents on computer screens, with occasional graphic images thrown in as illustrations. 2. Audio : Sound is perhaps the most element of multimedia. It can provide the listening pleasure of music, the startling accent of special effects or the ambience of a mood-setting background. 3. Images : Images whether represented analog or digital plays a vital role in a multimedia. It is expressed in the form of still picture, painting or a photograph taken through a digital camera. 4. Animation : Animation is the rapid display of a sequence of images of 2-D artwork or model positions in order to create an illusion of movement. It is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in a number of ways. 5. Video : Digital video has supplanted analog video as the method of choice for making video for multimedia use. Video in multimedia are used to portray real time moving pictures in a multimedia project.

(ii) What are the basic tools for multimedia object generation (Give example of each) Sol.

MIDI is a standard that manufactures. it is a set of specifications they use in building their instrument so that the instrument of different manufacturers can, without difficulty communicate musical information between one another. A MIDI interface two components 1. Hardware 2. Data Format There are four MIDI reception modes Mode 1: Omni on / poly Mode 2: Omni on / mono Mode 3: Omni off / mono Mode 4 Omni off / poly (ii) Explain in brief MIDI devices. Sol. The heart of any MIDI system is the MIDI synthesizer device. Most synthesizer have the following common components. 1. Sound Generator 2. Microprocessor 3. Key Board 4. Control Channel 5. Auxiliary Controllers 6. Memory.
Q.7 Answer the following short answer question

Basic Tools for Multimedia Objects


The basic tools set for building multimedia project contains one or more authoring systems and various editing applications for text, images, sound, and motion video. A few additional applications are also useful for capturing images from the screen, translating file formats and tools for the making multimedia production easier. Text Editing and Word Processing Tools Ex: Microsoft Word and WordPerfect OCR Software Ex: Perceive Image editing tools Ex: Photoshop Painting and drawing tools Ex: DeskDraw, DeskPaint, Designer Sound editing tools Ex: SoundEdit Pro Animation, Video and Digital Movie Tools Ex: Animator Pro and Super Video Windows
OR Q.7 (i) Why MIDI is used in Multimedia system? Explain MIDI Interface components and MIDI reception Modes. Sol.

MIDI (Musical Instrument Digital Interface) is a communication standard developed for electronic musical instruments and computers. MIDI files allow music and sound synthesizers from different manufacturers to communicate with each other by sending messages along cables connected to the devices.

a. What is the Audio file format?

Sol. An audio file format is a file format for storing audio data on a
computer system. It can be a raw bitstream, but it is usually a container format or an audio data format with defined storage layer. The general approach towards storing digital audio is to sample the audio voltage which, on playback, would correspond to a certain level of signal in an individual channel with a certain resolutionthe number of bits per samplein regular intervals (forming the sample rate). This data can then be stored uncompressed, or compressed to reduce the file size.

c. What is the update dynamics and Motion Dynamics? Sol.

Motion Dynamics
With motion Dynamics , object can moved and enabled with respect to a stationary observer. The object can also remain stationary and the view round can move . In many cases object and the camera are moving. Example Flight simulator

Update Dynamics
Update Dynamics is the actual change of the shape , color , or other properties of the object being viewed. For instance , a system can display the deformation of an in-flight air plane structure in response to the operator s manipulation of the many control mechanism. d. What is Dithering? Sol. The process of representing intermediate colors by patterns of tiny colored dots that simulate the desired color. The positioning of different coloured pixels within an image that uses a 256-colour palette to simulate a colour that does not exist in the palette. A dithered image often looks 'noisy', or composed of scattered pixels.

A file format determines the application that is to be used for opening a file. Following is the list of different file formats and the software that can be used for opening a specific file. 1. *.AIF, *.SDII in Macintosh Systems 2. *.SND for Macintosh Systems 3. *.WAV for Windows Systems 4. MIDI files used by north Macintosh and Windows 5. *.WMA windows media player 6. *.MP3 MP3 audio 7. *.RA Real Player 8. *.VOC VOC Sound 9. AIFF sound format for Macintosh sound files 10. *.OGG Ogg Vorbis
b. What is animation? Sol. Animation is the rapid display of a sequence of images of 2-D or 3-D artwork or model positions in order to create an illusion of movement. The effect is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in several ways.

A type of shading used in pixel art to add shadow or to blend two colours together. Where "X" is a pixel of one colour and "Y" is a pixel of another colour, the following is one example of dithering. xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx xyxyxyxyxyxyxyxyxyxyxy yxyxyxyxyxyxyxyxyxyxyx xyxyxyxyxyxyxyxyxyxyxy yyyyyyyyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyyyyyyyy

e. What is the data compression?


Though you may not realise it, data compression affects most aspects of computing today. In fact many websites use compression to reduce the amount of physical traffic they send and save time. As a developer you're probably familiar with archive utilities that compress files into archives with one of these extensions.

very diverse applications seem to share a common assumption-multimedia information helps people learn.

What are multimedia objects?


A multimedia object is something that can play music, a video clip, run a program or view source code or in some way electronically enhance the information contained in the article itself beyond the normal graphics. Multimedia File Types There are two major categories of multimedia objects. The first is a playable file of some sort, such as a video clip or a sound file. The second is a data set, where the author has provided raw data, possibly along with some sort of program to help viewers of the article manipulate and analyze the data. This category may include program source code and directions for its use.

Ace Rar Zip BZ2

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes.
Q.8 (i) What is multimedia Information? Explain the various stages of Multimedia objects. Sol.

Text Objects

What is Multimedia Information? Multimedia is the use of text, graphics, animation, pictures, video, and sound to present information. Since these media can now be integrated using a computer, there has been a virtual explosion of computer-based multimedia instructional applications. These applications computer-based tutorials . These

(postscript) (.ps) (Acrobat Portable Document Format) (.pdf) (LaTex) (.tex) (plain ascii text) (.txt) (postscript document compressed with gzip) (.ps.gz) (Microsoft Office Word) (.doc) (rich text format) (.rtx) (HyperText Markup Language-HTML) (.html) LZW and GIF explained (HTML compressed with zip) (.html.zip) Hello World (Amer Run Length Encoding - uncompressed and compressed) (.ARLE) Misc .tex and .ps files (Tape Archive File) (.tar) Excel spreadsheet (compressed/archived with Winarj) (.arj)

Excel spreadsheed (compressed/archived with Winrar) (.rar) example of unicode text (not sure of correct postfix?) (.unicode?) Event scheduling slides (Microsoft Powerpoint) (.ppt) inline TCL/TK script (.tcl)

Audio/Sound objects (several generated

with Unix programs "audiotool", "sox")


Image Objects (many generated with Unix

program "xv")

Puzzle: try to find the beautiful person in the flowers (Graphics Interchange Format) (.gif)

Welcome to my home page (original audio/basic for NeXT, Sun) (.au) Welcome to my home page (converted to 3 bit G.721 ADPCM using audiotool) (.adpcm) Welcome to my home page (converted to 4 bit G.721 ADPCM using audiotool) (.adpcm) Welcome to my home page(mono converted to cd (stereo) format using audiotool) (.cd) Welcome to my home page (mono converted to dat (stereo) format using audiotool) (.dat)

Skull (animated Graphics Interchange Format) (.gif) Amer photo (Joint Photographic Experts Group-JPEG) (.jpg) Horizontal line optical illusion (Joint Photographic Experts Group-JPEG) (.jpe) World Map (Joint Photographic Experts GroupJPEG2000) (.jp2) Amer photo (Tagged Image File Format-TIFF) (.tif) Butterfly in scalable vector graphics format (from Ertugrul Yilmaz) (.svg) Amer photo (Portable bit map - pbm/pgm/ppm - portable 'bitmap', 'graymap', 'pixmap', resp.) (.pgm) Amer photo (GEM Paint File) (.img) image-shaft Amiga image (.iff) UD Logo - Direct Draw Surface (from Joseph Rall) (.dds) Mountain (ArtRage) (.ptg) Amer photo (Flexible Image Transport System-FITS) (.fts)

Video objects

Tarzan movie trailer from Walt Disney (quicktime movie) (.mov) [stored locally in Amer's web files] Star Wars (quicktime movie) (.qt) Barbershop Mathematicians (Windows Media Video (from Ben Kagel)) (.wmv) Bouncing dice (MPEG video only) (.mpg) How to pronounce "linux" (Flash Video from Mike Ceresini) (.flv) Tree Power (Flash Video from Chiu Chien-Cheng) (.flv) Preview for the Move "Emma" (Video for Windows) (video/msvideo) (.avi) Virtual Reality Telephone (Virtual Reality Modelling Language-VRML) (.wrl, .wrz) Interesting object from UD Prof Wexler (Virtual Reality Modelling Language-VRML) (.wrl) Welcome to RealPlayer (Real video/audio) (.ra, .ram) Mr. Nice (Macromedia Flash - shock wave flash) (.swf) Fountain AutoDeskAnimator (.flc)

(ii) Explain various stages of Multimedia projects with example. Sol.

project to the end user. This stage has several steps such as implementation, maintenance, shipping and marketing the product.
Q.9 (i) Explain different application of multimedia with example. Sol.

Stages of Multimedia Application Development


A Multimedia application is developed in stages as all other software are being developed. In multimedia application development a few stages have to complete before other stages being, and some stages may be skipped or combine d with other stages. Following are the four basic stages of multimedia project development : 1. Planning and Costing : This stage of multimedia application is the first stage which begins with an idea or need. This idea can be further refined by outlining its messages and objectives. Before starting to develop the multimedia project, it is necessary to plan what writing skills, graphic art, music, video and other multimedia expertise will be required. It is also necessary to estimate the time needed to prepare all elements of multimedia and prepare a budget accordingly. After preparing a budget, a prototype or proof of concept can be developed. 2. Designing and Producing : The next stage is to execute each of the planned tasks and create a finished product. 3. Testing : Testing a project ensure the product to be free from bugs. Apart from bug elimination another aspect of testing is to ensure that the multimedia application meets the objectives of the project. It is also necessary to test whether the multimedia project works properly on the intended deliver platforms and they meet the needs of the clients. 4. Delivering : The final stage of the multimedia application development is to pack the project and deliver the completed

Multimedia Applications
Applications of Multimedia
Multimedia finds its application in various areas including, but not limited to, advertisements, art, education, entertainment, engineering, medicine, mathematics, business, scientific research and spatial, temporal applications. A few application areas of multimedia are listed below: Creative industries Creative industries use multimedia for a variety of purposes ranging from fine arts, to entertainment, to commercial art, to journalism, to media and software services provided for any of the industries listed below. An individual multimedia designer may cover the spectrum throughout their career. Request for their skills range from technical, to analytical and to creative. Commercial Much of the electronic old and new media utilized by commercial artists is multimedia. Exciting presentations are used to grab and keep attention in advertising. Industrial, business to business, and interoffice communications are often developed by creative services firms for advanced multimedia presentations beyond simple slide shows to sell ideas or liven-up training. Commercial multimedia developers may be hired to design for governmental services and nonprofit services applications as well. Entertainment and Fine Arts In addition, multimedia is heavily used in the entertainment industry, especially to develop special effects in movies and animations. Multimedia games are a popular pastime and are software programs available either as CD-ROMs or online. Some video games also use multimedia features. Multimedia applications

that allow users to actively participate instead of just sitting by as passive recipients of information are called Interactive Multimedia. Education In Education, multimedia is used to produce computer-based training courses (popularly called CBTs) and reference books like encyclopedia and almanacs. A CBT lets the user go through a series of presentations, text about a particular topic, and associated illustrations in various information formats. Edutainment is an informal term used to describe combining education with entertainment, especially multimedia entertainment. Engineering Software engineers may use multimedia in Computer Simulations for anything from entertainment to training such as military or industrial training. Multimedia for software interfaces are often done as collaboration between creative professionals and software engineers. Industry In the Industrial sector, multimedia is used as a way to help present information to shareholders, superiors and coworkers. Multimedia is also helpful for providing employee training, advertising and selling products all over the world via virtually unlimited web-based technologies. Mathematical and Scientific Research In Mathematical and Scientific Research, multimedia is mainly used for modeling and simulation. For example, a scientist can look at a molecular model of a particular substance and manipulate it to arrive at a new substance. Representative research can be found in journals such as the Journal of Multimedia. Medicine In Medicine, doctors can get trained by looking at a virtual surgery or they can simulate how the human body is affected by diseases spread by viruses and bacteria and then develop techniques to prevent it.

Multimedia in Public Places In hotels, railway stations, shopping malls, museums, and grocery stores, multimedia will become available at stand-alone terminals or kiosks to provide information and help. Such installation reduce demand on traditional information booths and personnel, add value, and they can work around the clock, even in the middle of the night, when live help is off duty.

Examples of Multimedia Applications include:


_ World Wide Web _ Multimedia Authoring, e.g. Adobe/Macromedia Director _ Hypermedia courseware _ Video-on-demand _ Interactive TV _ Computer Games _ Virtual reality _ Digital video editing and production systems _ Multimedia Database systems (ii) What do you understand by digital audio? What kind of applications requires audio capability? Sol.

Digital audio is created when a sound wave is converted into numbers a process referred to as digitizing. It is possible to digitize sound from a microphone, a synthesizer, existing tape recordings, live radio and television broadcasts, and popular CDs. You can digitize sounds from a natural source or prerecorded. Digitized sound is sampled sound. Ever nth fraction of a second, a sample of sound is taken and stored as digital information in bits

and bytes. The quality of this digital recording depends upon ho w often the samples are taken. Digital audio uses pulse-code modulation and digital signals for sound reproduction. This includes analog-to-digital conversion (ADC), digital-to-analog conversion (DAC), storage, and transmission. In effect, the system commonly referred to as digital is in fact a discrete-time, discrete-level analog of a previous electrical analog. While modern systems can be quite subtle in their methods, the primary usefulness of a digital system is the ability to store, retrieve and transmit signals without any loss of quality.

process, downloads it onto your computer. It uses QuickTime Image Capture[1]. This way it doesn't need to understand anything about the actual camera to be able to do that. It was first introduced in Mac OS X version 10.0 (Cheetah) Image Capture is scriptable with AppleScript, and may be manipulated with Mac OS X v10.4 (Tiger)'s "Automator" application. As of Mac OS X 10.4, Image Capture's AppleScript dictionary does not open in Script Editor. As of Mac OS X 10.6 only the Image Capture Web Server opens in Script Editor.

Image-Editing Tools
Image-editing application is specialized and powerful tools for enhancing and retouching existing bitmapped images. These applications also provide many of the feature and tools of painting and drawing programs and can be used to create images from scratch as well as images digitized from scanners, video framegrabbers, digital cameras, clip art files, or original artwork files created with a painting or drawing package. Here are some features typical of image-editing applications and of interest to multimedia developers: 1. Multiple windows that provide views of more than one image at a time 2. Conversion of major image-data types and industrystandard file formats 3. Direct inputs of images from scanner and video sources 4. Employment of a virtual memory scheme that uses hard disk space as RAM for images that require large amounts of memory 5. Capable selection tools, such as rectangles, lassos, and magic wands, to select portions of a bitmap 6. Image and balance controls for brightness, contrast, and color balance 7. Good masking features 8. Multiple undo and restore features 9. Anti-aliasing capability, and sharpening and smoothing controls

AV capability
Audio-visual applications require different performance characteristics than are required of a hard disk drive used for regular, everyday computer use. Typical computer usage involves many requests for relatively small amounts of data. By contrast, AV applications - digital audio recording, video editing and streaming, CD writing, etc. - involve large block transfers of sequentially stored data. Their prime requirement is for a steady, uninterrupted stream of data, so that any "dropout" in the analogue output is avoided.
Q.10 (i) What is Image capturing ? Explain the authoring tools for image capturing. Ans.

Image Capture is an application program that enables users to upload pictures from digital cameras or scanners which are either connected directly to the computer or the network. It provides no organizational tools like iPhoto but is useful for collating pictures from a variety of sources with no need for drivers. It achieves this by "receiving the picture" as it is and through a conversion

10. Color-mapping controls for precise adjustment of color balance 11. Tools for retouching, blurring, sharpening, lightening, darkening, smudging, and tinting 12. Geometric transformation such as flip, skew, rotate, and distort, and perspective 13. changes 14. Ability to resample and resize an image 15. 134-bit color, 8- or 4-bit indexed color, 8-bit gray-scale, black-and-white, and customizable color palettes 16. Ability to create images from scratch, using line, rectangle, square, circle, ellipse, polygon, airbrush, paintbrush, pencil, and eraser tools, with customizable brush shapes and user-definable bucket and gradient fills 17. Multiple typefaces, styles, and sizes, and type manipulation and masking routines 18. Filters for special effects, such as crystallize, dry brush, emboss, facet, fresco, graphic pen, mosaic, pixelize, poster, ripple, smooth, splatter, stucco, twirl, watercolor, wave, and wind 19. Support for third-party special effect plug-ins 20. Ability to design in layers that can be combined, hidden, and reorder
(ii) Explain the card and page based authoring tools.

5. Link these pages or cards into organized sequences. 6. jump, on command, to any page 7. play sound elements and launch animations and digital video.

In these authoring systems, elements are organized as pages of a book or stack of cards. The authoring system lets you link these pages or cards into organized sequence and they also allow you to play sound elements and launch animations and digital videos. Page-based authoring systems are object-oriented: the objects are the buttons, graphics and etc. Each object may contain a programming script activated when an event related to that object occurs. EX: Visual Basic Q.11 (i) Explain Shannon Fano Algorithm. Ans. Shannon-Fano Coding refix-free, variable-to-variable code P Top-down binary tree method

Ans Card-based or page-based tools: 1. The elements are organized as pages of a book or a stack of cards. 2. Card-or page-based authoring systems 3. best used when the bulk of your content consists of elements that can be viewed individually the pages of a book or cards in a card file.

(1) The set of source symbols are sorted in order of probabilities (2) Each symbol is interpreted as the root of a tree

non-increasing

(3) The list is divided in two groups, such that each group has nearly equal total probabilities (4) The codeword of the first group is appended with 0 (5) The codeword of the second group is appended with 1 (6) Steps (3-4) are repeated for each subgroup, until each subset contains only one node

Faster file transfer Variable dynamic range Byte order independent * To obtain these advantages the compression and decompression must be carried out directly by writing and reading programs e.g. AS PART OF A STANDARD FILE FORMAT DISADVANTAGES OF DATA COMPRESSION: Added complication Effect of errors in transmission Slower for sophisticated methods (but simple methods can be faster for writing to disk.) ``Unknown'' byte / pixel relationship (+) Need to decompress all previous data (+)

Q.11 (i) Explain the Meaning of the term Encoding and Compression. Does encoding always imply compression? If not, illustrate under what types of encoding this is not true. Ans .

Multimedia Encoding
Digitally sampled media such as graphics, video, audio and text.

(ii) What benefits are offered by compression schemes in designing Multimedia Systems? Ans

DATA

COMPRESSION:

ADVANTAGES

AND

DISADVANTAGES
ADVANTAGES OF DATA COMPRESSION: Less disk space (more data in reality) (*) Faster writing and reading (*)

Raw samples are typically transformed into some other encoding. These encoding can have the not very useful parts removed eg. high frequency bands of images and video. Different encodings are optimised for different forms of primary image eg. JPEG for photo-realistic scenes, LPC for speech.

In computer science, data compression or source coding is the process of encoding information using fewer bits, or information units, thanks to specific encoding schemes. For

example, this article could be encoded with fewer bits if we accept the convention that the word "compression" is encoded as "CP!". Compression only works when both the sender and receiver of the information message have agreed on the encoding scheme.

Does encoding always imply compression? If not, illustrate under what types of encoding this is not true
Encoding USUALLY implies compression, although there are codecs that can be used which are described as 'lossless'. Generally if you are working with digital video and audio it is best to keep them either in their original state OR encoded with a lossless codec right up until the point they are supposed to be written to their final output media.

1. Initialization: Put all symbols on a list sorted according to their frequency counts. 2. Repeat until the list has only one symbol left: 1) From the list pick two symbols with the lowest frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node. 2) Assign the sum of the children's frequency counts to the parent and insert it into the list such that the order is maintained. 3) Delete the children from the list. 3. Assign a codeword for each leaf based on the path from the root. Huffman Coding refix-free, variable-to-variable code P Bottom-up binary tree method (1) The set of source symbols are sorted in order of non-increasing probabilities (2) Each symbol is interpreted as the root of a tree (3) Two subtrees with the smallest probabilities are merged into a new subtree, which root element is assigned the probability sum The left subtree is marked with 1, the right subtree is marked with 0 (4) Step (3) is repeated until a single tree remains, containing all symbols and having a probability sum of 1 (5) The code of the leaves is given by the sequence of marks on the path from the root to the leaf In step (3), there could be several possible choices, and the Huffman code can be different, but the average codeword length is the same

This means that if you want to create something for Youtube you should keep it in its original form right through the editing etc and only compress it through a lossy codec at the point you want to upload it to youtube. Remember though, that a lot of editing tools will automatically compress a video when you create the file if you are using transitions or effects. That is certainly true of many current editing tools.
(ii) Explain the Huffman Algorithm. Ans.

Huffman Coding Huffman Coding Algorithm a bottom-up approach

Voice recognition is "the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned" [ADA90]. While the concept could more generally be called "sound recognition", we focus here on the human voice because we most often and most naturally use our voices to communicate our ideas to others in our immediate surroundings. In the context of a virtual environment, the user
Q.12 (i) Write about Voice recognition application? (5) Ans.

would presumably gain the greatest feeling of immersion, or being part of the simulation, if they could use their most common form of communication, the voice. The difficulty in using voice as an input to a computer simulation lies in the fundamental differences between human speech and the more traditional forms of computer input. While computer programs are commonly designed to produce a precise and well-defined response upon receiving the proper (and equally precise) input, the human voice and spoken words are anything but precise. Each human voice is different, and identical words can have different meanings if spoken with different inflections or in different contexts. Several approaches have been tried, with varying degrees of success, to overcome these difficulties.

Voice Recognition - Voice Recognition allows a user to use his/her voice as an input device. Voice recognition may be used to dictate text into the computer or to give commands to the computer (such as opening application programs, pulling down menus, or saving work). Continuous speech voice recognition applications allow a user to dictate text fluently into the computer. These new applications can recognize speech at up to 160 words per minute. While the accuracy of voice recognition has improved over the past few years some users still experience problems with accuracy either because of the way they speak or the nature of their voice.

What is voice recognition, and why is it useful in a virtual environment?

Applications

Health care
In the health care domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. The services provided may be redistributed rather than replaced.

the UK. Work in France has included speech recognition in the Puma helicopter.

Battle management
Battle Management command centres generally require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format. Human-machine interaction by voice has the potential to be very useful in these environments. A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. In one feasibility study speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. Users were very optimistic about the potential of the system, although capabilities were limited

Military

High-performance fighter aircraft


Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on installing speech recognition systems on Mirage aircraft, and programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays.

Helicopters
The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone. Substantial test and evaluation programs have been carried out in the past decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in

Training air traffic controllers


Training for air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation. Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel.

Telephony and other domains Further applications


inaudible to your ears, but experts say they can hear the lost part. One other problem with lossy compression is that repeated compression and expansion causes serious sound artifacts to appear. Specially if different algorythms are used. And music can under go easily 3 cycles of compression expansion, for example, once from digital audio transmission, once for storage on disk, once for recording on CD.

Automatic translation; Automotive speech recognition (e.g., Ford Sync); Telematics (e.g. vehicle Navigation Systems); Court reporting (Realtime Voice Writing); Hands-free computing: voice command recognition computer user interface; Home automation; Interactive voice response; Mobile telephony, including mobile email; Multimodal interaction; Pronunciation evaluation in computer-aided language learning applications; Robotics; Video games, with Tom Clancy's EndWar and Lifeline as working examples; Transcription (digital speech-to-text); Speech-to-text (transcription of speech into mobile text messages); Air Traffic Control Speech Recognition.

Lossless vs. lossy compression


The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application.

(ii) What are the advantages and disadvantages of lossless compression? Compare and contrast these with lossy compression. Ans.

Lossy methods are most often used for compressing sound, images or videos. The compression ratio (that is, the size of the compressed file compared to that of the uncompressed file) of lossy video codecs are nearly always far superior to those of the audio and still-image equivalents. Audio can be compressed at 10:1 with no noticeable loss of quality, video can be compressed immensely with little visible quality loss, eg 300:1. Lossily compressed still images are often compressed to 1/10th their original size, as with audio, but the quality loss is more noticeable,

It's lossless, so you don't lose any information, and the output is an exact copy of the input. But it uses more memory that loosy compression, that is, the compression ratio is not as good. Lossy compression compresses files to a smaller size but there is some data lost. It is a part of music that is (supposed to be)

especially

on

closer

inspection.

perfectly. Similarly, two clips of audio may be perceived as the same to a listener even though one is missing details found in the other. Lossy data compression algorithms introduce relatively minor differences and represent the picture, video, or audio using fewer bits.

When a user acquires a lossily-compressed file, (for example, to reduce download-time) the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many methods focus on the idiosyncrasies of the human anatomy, taking into account, for example, that the human eye can see only certain frequencies of light. The psycho-acoustic model describes how sound can be highly compressed without degrading the perceived quality of the sound. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as compression artifacts. Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small.

Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression. However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress encrypted data.

In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty. Q. 13 What is data compression? Explain different compression techniques with suitable example. Ans Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general

Another kind of compression, called lossy data compression, is possible if some loss of fidelity is acceptable. For example, a person viewing a picture or television video scene might not notice if some of its finest details are removed or not represented

these methods can be divided into two broad categories: lossless and lossy methods.

combination of symbols. It does not need to know the frequency of occurrence of symbols and can be very efficient if data is represented as 0s and 1s. The general idea behind this method is to replace consecutive repeating occurrences of a symbol by one occurrence of the symbol followed by the number of occurrences. The method can be even more efficient if the data uses only two symbols (for example 0 and 1) in its bit pattern and one symbol is more frequent than the other.

Figure : Data compression methods LOSSLESS COMPRESSION In lossless data compression, the integrity of the data is preserved. The original data and the data after compression and decompression are exactly the same because, in these methods, the compression and decompression algorithms are exact inverses of each other: no part of the data is lost in the process. Redundant data is removed in compression and added during decompression. Lossless compression methods are normally used when we cannot afford to lose any data. 1. Run-length encoding Run-length encoding is probably the simplest method of compression. It can be used to compress data made of any

Figure : Run-length encoding example

Figure: Run-length encoding for two symbols 2. Huffman coding Huffman coding assigns shorter codes to symbols that occur more frequently and longer codes to those that occur less frequently. For example, imagine we have a text file that uses only five characters (A, B, C, D, E). Before we can assign bit patterns to each character, we assign each character a weight based on its frequency of use. In this example, assume that the frequency of the characters is as shown in Table 15.1.

Figure: Huffman coding A characters code is found by starting at the root and following the branches that lead to that character. The code itself is the bit value of each branch on the path, taken in sequence.

Figure 15.6 Huffman encoding Decoding The recipient has a very easy job in decoding the data it receives. Figure 15.7 shows how decoding takes place.

Figure : Final tree and code Encoding Let us see how to encode text using the code for our five characters. Figure 15.6 shows the original and the encoded text. 3. Lempel Ziv encoding Lempel Ziv (LZ) encoding is an example of a category of algorithms called dictionary-based encoding. The idea is to create a dictionary (a table) of strings used during the communication session. If both the sender and the receiver have a copy of the dictionary, then previously-encountered strings can be substituted by their index in the dictionary to reduce the amount of information transmitted. Figure 15.7 Huffman decoding

Compression In this phase there are two concurrent events: building an indexed dictionary and compressing a string of symbols. The algorithm extracts the smallest substring that cannot be found in the dictionary from the remaining uncompressed string. It then stores a copy of this substring in the dictionary as a new entry and assigns it an index value. Compression occurs when the substring, except for the last character, is replaced with the index found in the dictionary. The process then inserts the index and the last character of the substring into the compressed string.

Figure 15.8 An example of Lempel Ziv encoding Decompression Decompression is the inverse of the compression process. The process extracts the substrings from the compressed string and tries to replace the indexes with the corresponding entry in the dictionary, which is empty at first and built up gradually. The idea is that when an index is received, there is already an entry in the dictionary corresponding to that index.

LOSSY COMPRESSION METHODS Our eyes and ears cannot distinguish subtle changes. In such cases, we can use a lossy data compression method. These methods are cheaperthey take less time and space when it comes to sending millions of bits per second for images and video. Several methods have been developed using lossy compression techniques. JPEG (Joint Photographic Experts Group) encoding is used to compress pictures and graphics, MPEG (Moving Picture Experts Group) encoding is used to compress video, and MP3 (MPEG audio layer 3) for audio compression. 1. Image compression JPEG encoding As we know , an image can be represented by a two-dimensional array (table) of picture elements (pixels). A grayscale picture of 307,200 pixels is represented by 2,457,600 bits, and a color picture is represented by 7,372,800 bits. In JPEG, a grayscale picture is divided into blocks of 8 8 pixel blocks to decrease the number of calculations because, as we will see shortly, the number of mathematical operations for each picture is the square of the number of units.

Figure 15.9 An example of Lempel Ziv decoding

Discrete cosine transform (DCT) In this step, each block of 64 pixels goes through a transformation called the discrete cosine transform (DCT). The transformation changes the 64 values so that the relative relationships between pixels are kept but the redundancies are revealed. The formula is given in Appendix G. P(x, y) defines one value in the block, while T(m, n) defines the value in the transformed block. To understand the nature of this transformation, let us show the result of the transformations for three cases.

Figure 15.10 JPEG grayscale example, 640 480 pixels The whole idea of JPEG is to change the picture into a linear (vector) set of numbers that reveals the redundancies. The redundancies (lack of changes) can then be removed using one of the lossless compression methods we studied previously. A simplified version of the process is shown in Figure 15.11. Figure 15.12 Case 1: uniform grayscale

Figure 15.11 The JPEG compression process

quantize each value. The divisor depends on the position of the value in the T table. This is done to optimize the number of bits and the number of 0s for each particular application. Compression After quantization the values are read from the table, and redundant 0s are removed. However, to cluster the 0s together, the Figure 15.13 Case 2: two sections process reads the table diagonally in a zigzag fashion rather than row by row or column by column. The reason is that if the picture does not have fine changes, the bottom right corner of the T table is all 0s. JPEG usually uses run-length encoding at the compression phase to compress the bit pattern resulting from the zigzag linearization

Figure 15.14 Case 3: gradient grayscale Quantization After the T table is created, the values are quantized to reduce the number of bits needed for encoding. Quantization divides the number of bits by a constant and then drops the fraction. This reduces the required number of bits even more. In most implementations, a quantizing table (8 by 8) defines how to

Figure 15.15 Reading the table

2. Video compression MPEG encoding The Moving Picture Experts Group (MPEG) method is used to compress video. In principle, a motion picture is a rapid sequence of a set of frames in which each frame is a picture. In other words, a frame is a spatial combination of pixels, and a video is a temporal combination of frames that are sent one after another. Compressing video, then, means spatially compressing each frame and temporally compressing a set of frames. Figure 15.16 MPEG frames Spatial compression The spatial compression of each frame is done with JPEG, or a modification of it. Each frame is a picture that can be independently compressed. Temporal compression In temporal compression, redundant frames are removed. When we watch television, for example, we receive 30 frames per second. However, most of the consecutive frames are almost the same. For example, in a static scene in which someone is talking, most frames are the same except for the segment around the speakers lips, which changes from one frame to the next. Predictive encoding In predictive encoding, the differences between samples are encoded instead of encoding all the sampled values. This type of compression is normally used for speech. Several standards have been defined such as GSM (13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3 kbps). 3. Audio compression Audio compression can be used for speech or music. For speech we need to compress a 64 kHz digitized signal, while for music we need to compress a 1.411 MHz signal. Two categories of techniques are used for audio compression: predictive encoding and perceptual encoding.

Perceptual encoding: MP3 The most common compression technique used to create CDquality audio is based on the perceptual encoding technique. This type of audio needs at least 1.411 Mbps, which cannot be sent over the Internet without compression. MP3 (MPEG audio layer 3) uses this technique. Q.14 What is Dictionary Techniques ? Explain LZ77 with suitable example. Ans.

Q.15 Explain lossless compression techniques with suitable examples. Ans.

Q.16 What is Dictionary Techniques ? Explain different categories Dictionary based coding with suitable example.
Ans.

Dictionary Coding
Dictionary coding is different from Huffman coding and arithmetic coding. Both Huffman and arithmetic coding techniques are based on a statistical model (e.g., occurrence probabilities).

In dictionary-based data compression techniques, a symbol or a string of symbols generated from a source alphabet is represented by an index to a dictionary constructed from the source alphabet. A dictionary is a list of symbols and strings of symbols. There are many examples of this in our daily lives. the string September vs. 9, a social security number vs. a person in the U.S. Dictionary coding is widely used in text coding. Consider English text coding. The source alphabet includes 26 English letters in both lower and upper cases, numbers, various punctuation marks and the space bar. Huffman or arithmetic coding treats each symbol based on its occurrence probability. That is, the source is modeled as a memoryless source. It is well known, however, that this is not true in many applications. In text coding, structure or context plays a significant role. Very likely that the letter u appears after the letter q. Likewise, it is likely that the word concerned will appear after As far as the weather is. The strategy of the dictionary coding is to build a dictionary that contains frequently occurring symbols and string of symbols. When a symbol or a string is encountered and it is contained in the dictionary, it is encoded with an index to the dictionary. Otherwise, if not in the dictionary, the symbol or the string of symbols is encoded in a less efficient manner. All the dictionary schemes have an equivalent statistical scheme, which achieves exactly the same compression. Statistical scheme using high-order context models may outperform dictionary coding (with high computational complexity). But, dictionary coding is now superior for fast speed and economy of memory. In future, however, statistical scheme may be preferred [bell 1990].

We denote a source alphabet by S. A dictionary consisting of two elements is defined as , P is a finite set of phrases generated from the S, C is a coding function mapping P onto a set of codewords. The set P is said to be complete if any input string can be represented by a series of phrases chosen from the P. The coding function C is said to obey the prefix property if there is no codeword that is a prefix of any other codeword. For practical usage, i.e., for reversible compression of any input text, the phrase set P must be complete and the coding function C must satisfy the prefix property.

Categorization of Dictionary-Based Techniques

Coding

The heart of dictionary coding is the formulation of the dictionary. A successfully built dictionary results in data compression; the opposite case may lead to data expansion. According to the ways in which dictionaries are constructed, dictionary coding techniques can be classified as static or adaptive.

1. Static Dictionary Coding Formulation of Dictionary Coding Define dictionary coding in a precise manner [bell 1990]. A fixed dictionary, Produced before the coding process Used at both the transmitting and receiving ends

It is possible when the knowledge about the source alphabet and the related strings of symbols, also known as phrases, is sufficient. Merit of the static approach: its simplicity. Its drawbacks lie on Relatively lower coding efficiency Less flexibility compared with adaptive dictionary techniques An example of static algorithms occurs is diagram coding. A simple and fast coding technique. The dictionary contains: all source symbols and some frequently used pairs of symbols. In encoding, two symbols are checked at once to see if they are in the dictionary. If so, they are replaced by the index of the two symbols in the dictionary, and the next pair of symbols is encoded in the next step. If not, then the index of the first symbol is used to encode the first symbol. The second symbol is combined with the third symbol to form a new pair, which is encoded in the next step. The diagram can be straightforwardly extended to n-gram. In the extension, the size of the dictionary increases and so is its coding efficiency. 2. Adaptive Dictionary Coding

A completely defined dictionary does not exist prior to the encoding process and the dictionary is not fixed. At the beginning of coding, only an initial dictionary exists. It adapts itself to the input during the coding process. All adaptive dictionary coding algorithms can be traced back to two different original works by Ziv and Lempel [ziv 1977, ziv 1978]. The algorithms based on [ziv 1977] are referred to as the LZ77 algorithms Those based on [ziv 1978] the LZ78 algorithms. Parsing Strategy Once have a dictionary, Need to examine the input text and find a string of symbols that matches an item in the dictionary. Then the index of the item to the dictionary is encoded. This process of segmenting the input text into disjoint strings (whose union equals the input text) for coding is referred to as parsing. Obviously, the way to segment the input text into strings is not unique. In terms of the highest coding efficiency, optimal parsing is essentially a shortest-path problem [bell 1990]. In practice, however, a method called greedy parsing is used in all the LZ77 and LZ78 algorithms.

With greedy parsing, the encoder searches for the longest string of symbols in the input that matches an item in the dictionary at each coding step. Greedy parsing may not be optimal, but it is simple in implementation.

Sliding Window (LZ77) Algorithms Introduction The dictionary used is actually a portion of the input text, which has been recently encoded. The text that needs to be encoded is compared with the strings of symbols in the dictionary.

The longest matched string in the dictionary is characterized by a pointer (sometimes called a token), which is represented by a triple of data items. Note that this triple functions as an index to the dictionary. In this way, a variable-length string of symbols is mapped to a fixed-length pointer. There is a sliding window in the LZ77 algorithms. The window consists of two parts: a search buffer and a look-ahead buffer. The search buffer contains: the portion of the text stream that has recently been encoded --- the dictionary. The look-ahead buffer contains: the text to be encoded next. The window slides through the input text stream from beginning to end during the entire encoding process. The size of the search buffer is much larger than that of the look-ahead buffer. The sliding window: usually on the order of a few thousand symbols. The look-ahead buffer: on the order of several tens to one hundred symbols. Example Encoding

Summary of the LZ77 Approach


Decoding: Now let us see how the decoder recovers the string baccbaccgi from these three triples. In Figure 6.5, for each part, the last previously encoded symbol c prior to the receiving of the three triples is shaded. The first symbol in the look-ahead buffer is the symbol or the beginning of a string of symbols to be encoded at the moment. Let us call it the symbol s. In encoding, the search pointer moves to the left, away from the symbol s, to find a match of the symbol s in the search buffer. When there are multiple matches, the match that produces the longest matched string is chosen. The match is denoted by a triple <i, j, k>.

The first item in the triple, i, is the offset, which is the distance between the pointer pointing to the symbol giving the maximum match and the symbol s. The second item, j, is the length of the matched string. The third item, k, is the codeword of the symbol following the matched string in the look-ahead buffer. At the very beginning, the content of the search buffer can be arbitrarily selected. For instance, the symbols in the search buffer may all be the space symbol. Denote the size of the search buffer by SB, the size of the look-ahead buffer by L and the size of the source alphabet by A. Assume that the natural binary code is used. Then we see that the LZ77 approach encodes variable-length strings of symbols with fixed-length codewords.

maximum matching can enter into the lookahead buffer. The decoding process is simpler than the encoding process since there is no comparison involved in the decoding. The most recently encoded symbols in the search buffer serve as the dictionary used in the LZ77 approach. The merit of doing so is that the dictionary is adapted to the input text well. The limitation of the approach is that if the distance between the repeated patterns in the input text stream is larger than the size of the search buffer, then the approach cannot utilize the structure to compress the text. SB + , can L 8192 A window with a moderate size, say, compress a variety of texts well. Several reasons have been analyzed in [bell 1990]. Many variations have been made to improve coding efficiency of the LZ77 approach.

LZ78 Algorithms Introduction Limitations with LZ77:

The length of the matched string is equal to log 2 ( SB + L) because the search for the

If the distance between two repeated patterns is larger than the size of the search buffer, the LZ77 algorithms cannot work efficiently.

The fixed size of the both buffers implies that the matched string cannot be longer than the sum of the sizes of the two buffers, meaning another limitation on coding efficiency. Increasing the sizes of the search buffer and the look-ahead buffer seemingly will resolve the problems. A close look, however, reveals that it also leads to increases in the number of bits required to encode the offset and matched string length as well as an increase in processing complexity.

Since the decoder knows the rule applied in the encoding, it can reconstruct the dictionary and decode the input text stream from the received doubles. Table 6.4 An encoding example using the LZ78 algorithm Index 1 2 3 4 5 6 7 8 9 10 11 Doubles < 0, C(b) > < 0, C(a) > < 0, C(c) > < 3, 1 > < 2, 3 > < 3, 2 > < 4, 3 > < 2, 1 > < 3, 3 > < 1, 1 > < 5, 3 > Encoded symbols b a c cb ac ca cbc ab cc bb acc

LZ78:
No use of the sliding window. Use encoded text as a dictionary which, potentially, does not have a fixed size. Each time a pointer (token) is issued, the encoded string is included in the dictionary. Once a preset limit to the dictionary size has been reached, either the dictionary is fixed for the future (if the coding efficiency is good), or it is reset to zero, i.e., it must be restarted. Instead of the triples used in the LZ77, only pairs are used in the LZ78. Specifically, only the position of the pointer to the matched string and the symbol following the matched string need to be encoded.

Example 6.3 Encoding:


Consider the text stream: baccbaccacbcabccbbacc. In Table 6.4, in general, as the encoding proceeds, the entries in the dictionary become longer and longer. First, entries with single symbols come out, but later, more and more entries with two symbols show up. After that more and more entries with three symbols appear. This means that coding efficiency is increasing.

Decoding:

LZW Algorithm

Both the LZ77 and LZ78 approaches, when published in 1977 and 1978, respectively, were theory-oriented. The effective and practical improvement over the LZ78 in [welch 1984] brought much attention to the LZ dictionary coding techniques. The resulting algorithm is referred to the LZW algorithm. It removed the second item in the double (the index of the symbol following the longest matched string) and hence, it enhanced coding efficiency. In other words, the LZW only sends the indexes of the dictionary to the decoder. Consider the following input text stream: accbadaccbaccbacc. We see that the source alphabet is S={a,b,c,d,}.

8 9 10 11 12 13 14 15

ba ad da acc cba accb bac cc c

b 2 a 1

d 4 a,c 5 c,b 7 a,c, 11 8 b,a, c,c,

Encoding:
Table 6.5 An example of the dictionary coding using the LZW algorithm

Index 1 2 3 4 5 6 7

Entry a b c d ac cc cb

Input Smbols

Encoded Index

Decoding:
Initial diction ary
Initially, the decoder has the same dictionary (the top four rows in Table 6.5) as that in the encoder. Once the first index 1 comes, the decoder decodes a symbol a. The second index is 3, which indicates that the next symbol is c. From the rule applied in encoding, the decoder knows further that a new entry ac has been added to the dictionary with an index 5. The next index is 3. It is known that the next symbol is also c. It is also known that the string cc has been added into the dictionary as the sixth entry. In this way, the decoder reconstructs the dictionary and decodes the input text stream.

a C C

1 3 3

Q.17 Explain the Finite

context modeling in detail.

ANS.

Finite context modeling


Finite context models assign a probability to a symbol based on the frequencies of the symbols that happened before the current symbol to code. Before because when we code a symbol based on the previous symbols, we can be sure that the decoder will have this information too. (as the previous symbols have been already transmitted) Finite context modeling is based in the fact that a symbol that has appeared recently will appear in a near future, so the more times that it appears, the higher probability that it appears again, so every time it's seen we increment its probability. In this article I'll not deal with coders, but you can choose among static huffman, arithmetic coding, a variant of arithmetic coding, called range coder or many others. Arithmetic coding approaches the entropy, so from this point at on, we'll only use the entropy as reference. (assuming that we are using arithmetic coding) Let's say we have a message like "aaabbc" and a finite context model which assigns a probability of symbol_frequency/total_frequency to every symbol, then we have:

In this example the original length of the symbol "c" was 8 bits (we assume it was a byte) however our model assigned a code length of 2.59 bits, from this we can see that the redundancy was of 8-2.59 = 5.41 bits, we can generalize the following: for any model the redundancy of a given symbol is original_code_lengthcode_length. Note that the probability 1/6 for 'c', is the expected probably of the next symbol being a 'c'. Just notice from the previous example that every time a symbol appears we increment its probability, and 6 is the total number of symbols seen, (total frequency) from there we can get the probability of a given symbol. When we implement this simple finite context model we have a table with Z entries (alphabetsize, in case of bytes: 256) which is initialised to 0, and then when a symbol appears we increment its value, something like: + +table[symbol]. Now I'm sure that you are wondering if we can do better than the entropy, the answer is no. The compressed length of the message is (1*3)+(1.59*2)+(2.59*1)= 8.77 bits. Let's try to give a higher probability to the most probable symbol:

The two most important applications of speech coding are mobile telephony and Voice over IP. The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility. Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio signals, and that there is a lot more statistical information available about the properties of speech. As a result, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data. It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is completely intelligible, but subjectively annoying to the listener. In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction.

So the final length is: (0.42*3)+(2.59*2)+(3.59*1)= 10.03 bits. We have coded one symbol (the most probable one) with fewer bits, but the resulting output file is bigger, so on average we can't do better than the entropy. This can be mathematically proved with the Kraft Inequalities, for further reading see [1]. Of course we can do better models, which give higher probabilities to the symbols being coded, and thus we achieve higher compression. Finite context models are also called finite Markov models (Markov (1856-1922) was a Russian mathematician), because they assume the input file to be a Markovian sequence: every symbol depends only in the finite preceding sequence of symbols. Note that also exist models for a message of an infinite length, but in most of the cases this is not practical, see [4]. Also the term Markov model is used to refer to finite state models, however this falls out of the scope of this article.
Q.18 Explain Speech coding and what are the Modern speech compression? Ans. Speech coding is the application of data compression of

digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Sample companding viewed as a form of speech coding


From this viewpoint, the A-law and -law algorithms (G.711) used in traditional PCM digital telephony can be seen as a very early

precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform with a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech. A wide variety of other algorithms were tried at the time, mostly variants on delta modulation, but after careful consideration, the A-law/-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made them an excellent engineering compromise. Their audio performance remains acceptable, and there has been no need to replace them in the stationary phone network. In 2008, G.711.1 codec, which has a scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz.

higher channel capacities than the analog systems that preceded them. The most common speech coding scheme is Code Excited Linear Prediction (CELP) coding, which is used for example in the GSM standard. In CELP, the modelling is divided in two stages, a linear predictive stage that models the spectral envelope and code-book based model of the residual of the linear predictive model. In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. Usually, speech coding and channel coding methods have to be chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding, in order to get the best overall coding results. The Speex project is an attempt to create a free software speech coder, unencumbered by patent restrictions. Major subfields:

Modern speech compression


Much of the later work in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were required to allow effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI integrated circuits, than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios. These techniques were available through the open research literature to be used for civilian applications, allowing the creation of digital mobile phone networks with substantially

Wide-band speech coding o AMR-WB for WCDMA networks o VMR-WB for CDMA2000 networks o G.722, G.722.1, Speex and others for VoIP and videoconferencing Narrow-band speech coding o FNBDT for military applications o SMV for CDMA networks o Full Rate, Half Rate, EFR, AMR for GSM networks o G.723.1, G.726, G.728, G.729, iLBC and others for VoIP or videoconferencing

Q. 19 Explain Speech Compression and Digital Audio Concepts in Detail with suitable example and .

Digital Audio Concepts


For modern computers to manipulate sound, they first have to convert it to a digital format. The sound samples can then be processed, transmitted, and converted back to analog format, where they can finally be received by the human ear. Digitization of sound began in earnest in the early 1960s. Like much of our early computer technology, credit for development lies with AT&T, which at that time had a regulated monopoly on long-distance service in the United States. In 1962, AT&T established the first commercial digital telephone link, a T1 interoffice trunk in Chicago. In the short space of thirty years, we have seen the long-distance network in the United States convert almost entirely from analog to digital transmission. Virtually all new switching equipment installed by telephone companies today is digital. But analog switching is still found in older installations and in the smaller PBX and key systems installed in businesses. Of course, the final subscriber loop between the telephone company and the end user is still persistently analog. Digital audio is now coming of age in the highly visible consumer electronics arena as well. The digital compact disk has nearly completed its displacement of analog LP records. It remains to be seen whether digital audio tape will do the same thing to analog cassette tape, but it seems likely that some day most recorded music will be distributed in digital format.

Ans . Speech Compression


Manipulation of sound by computers is a relatively new development. It has been possible since the birth of digital computers, but only in the last five years or so has inexpensive hardware brought this to the average users desktop. Now the ability to play digitized sound is expected to be an integral part of the multimedia revolution. The use of multimedia focuses the issue of data compression for most users. Computer graphics in particular quickly take up all available disk space. Digitized audio is far less voracious in its storage requirements, but even so it can quickly swallow up all free space on the average users hard disk. Fortunately for computer users, the world of telephony has used digitized audio since the 1960s, and extensive research has been done on effective methods of encoding and compressing audio data. The worlds telecommunications companies were intensely aware of the cost of transmission bandwidth and made efforts to reduce expenses in this area. Computer users today benefit from much of this research. This chapter looks first at some of the basic concepts involved in using digital audio, including the software and hardware in todays generation of computers. Next, it looks at how well conventional lossless compression techniques work on digitized voice. Finally, it explores some lossy techniques.

Fundamentals
While this book cannot give a complete course in digital signal processing, it certainly has room to cover a few basic concepts involved in digital sound. Figure 10.1 shows a typical audio waveform as it might be displayed on an oscilloscope. The X axis in this diagram represents time. The Y axis represents a voltage

measured at an input device, typically a microphone. The microphone attempts to faithfully reproduce changes in air pressure caused by sound waves traveling through it. Some human ears can hear sounds at frequencies as high as 20,000Hz and nearly as low as DC. The dynamic range of our hearing is so wide that we have to employ a logarithmic scale of measurement, the decibel, to reasonably accommodate it. This presents a unique set of requirements for digitization. A waveform like the shown in Figure 10.1 is typical of audio sample. It isnt a nice, clean sine wave that has a regular period and can be described as a simple mathematical function. Instead, it is a combination of various frequencies at different amplitudes and phases. When combined, we see something that looks fairly irregular and not easy to characterize.

Figure 10.2 A typical audio waveform being sampled at 8KHz. In most computer systems, this first step of digitization is done with an analog-to-digital converter (ADC). The ADC takes a given voltage and scales it to an appropriate digital measurement. An eight-bit ADC, for example, might have a full scale input voltage of 500 millivolts (mv)it would output an eight-bit value of 255 if the input voltage were 500mv and zero if the input voltage were zero. A voltage between these values would be scaled to fit in the linear range of zero to 255. Since audio signals are AC in natured, the ranges are usually adjusted so that a zero voltage signal falls in the middle of the range. For the previous example, the range would be adjusted to between -250mv and +250mv. Outputs from the eight-bit ADC would range from -128 to +127. The stored sample points then represent a series of voltages that were measured at the input of the ADC. Figure 10.3 shows the representation of those voltages overlaid with the input AC signal. Note that since the sample points in this case are occurring many times more frequently than the period of the waveform, the digital samples themselves trace the analog signal very accurately.

Figure 10.1 A typical audio waveform. This particular snapshot shows about 5 milliseconds (ms) of output. Notice that the largest recognizable components of the waveform appear to have a period of roughly two milliseconds. This corresponds to a frequency of about 500Hz, a fairly characteristic frequency found in speech or music. The first step in working with digital audio is sampling. Sampling consists of taking measurements of the input signal at regular times, converting them to an appropriate scale, and storing them. Figure 10.2 shows the same waveform sampled at an 8KHz rate. This means that 8,000 times per second a measurement is taken of the voltage level of the input signal. The measurement points are marked with an x on the waveform.

Figure 10.3 Sample voltages overlaid with the input AC signal Now that the sound has been digitized, it can be stored via computer using any number of technologies, ranging from fast storage, such as main processor RAM, to off-line slow storage on magnetic tape. The actual speed of the storage medium is relatively unimportant with digital sound, since the bandwidth needed to

accurately store the sound is relatively slow compared to most digital media. Eventually, the sound needs to be played back. This is done via another electronic component that is the converse of the ADC: the digital-to-analog converter (DAC). The DAC is responsible for taking a digital value and converting it to a corresponding analog signal. To be effective, the conversion process needs to be the mirror image of that performed when converting the analog signal to digital. While the exact voltages produced at the output of the DAC do not need to be identical to those seen at the input, they do need to be proportional to one another so that one waveform corresponds to the other. In addition, the samples need to be output at exactly the same rate that they were read in. Any deviation here will cause the output frequencies to be shifted up or down from the input, generally not a good thing. Figure 10.4 shows the output of the DAC when given the same set of samples produced in Figure 10.2. At first glance, it seems that this is a radically different waveform. All the nice, smooth shapes shown in the earlier figures are gone, replaced by this stair-step, rectangular, artificial-looking creation.

unhindered, while attenuating frequencies above that point. An ideal low-pass filter used with the samples shown here would completely stop any frequency above 4KHz and let frequencies below 4KHz pass through with no attenuation. In practice, low-pass filters dont work perfectly, but even a lowbudget filter can take Figure 10.4 and create a nearly indistinguishable copy of Figure 10.1. Without the filter, the sound sample will still be intelligible, but it will be filled with distracting high-frequency noise that is part of the reproduction process. Figure 10.5 shows the same figure when the sampling rate has been stepped up to a much higher rate. This increase in sampling rate clearly does a more accurate job of reproducing the signal. The next section discusses how variations in these parameters affect the output signal.

Figure 10.5 Sampling at a much higher rate.

Sampling Variables
When an audio waveform is sampled, two important variables affect the quality of the reproduction: the sample rate and the sample resolution. Both are important factors, but they play different roles in determining the level of distortion produced when a sample is played back. The sample resolution is simply a measure of how accurately the digital sample can measure the voltage it is recording. When the input range is -500mv to +500mv, for example, an eight-bit ADC can resolve the input signal down to about 4mv. So an input signal of 2mv will either get rounded up to 4mv or down to 0mv. This is called a quantization error.

Figure 10.4 DAC output Fortunately, Figure 10.4 is not that far removed from Figure 10.1. Mathematically, the sharp jumps that occur when we move from sample to sample represent high-frequency components in the output signal. These can (and must) be eliminated from the signal by means of a low-pass filter that lies between the output of the DAC and the final amplification stage of the audio output. A low-pass filter is a network of electrical components designed to let frequencies below a certain value pass through it

Figure 10.6 shows the results of quantization error when sampling a waveform. In some cases the sample point has a larger magnitude than the audio signal, but in other places it has less. When the digitized signal is played back through a DAC, the output waveform will closely track the sample points, resulting in a certain amount of distortion.

Figure 10.7 A slower sampling rate. Figure 10.8 shows the waveform we could expect after playing back the digitized samples stored from Figure 10.7. Clearly, after the digitized output is filtered, the resulting waveform differs quite a bit from that shown in the previous figure. What has happened is that the high-frequency components of the waveform have been lost by the slower sampling rate, letting only the low-frequency parts of the sample through.

Figure 10.6 Quantization error when sampling a waveform It might seem that eight bits should be enough to accurately record audio data, but this may not be the case because of the large dynamic range of audio the human ear can detect. If our 500mv range example were used, we might find that our input signal magnitudes range from 1mv to 500mv in a single recording session. The crash of drums in an orchestra could push the ADC to its limits, while a delicate violin solo may never go outside 5mv. If the minimum digital resolution is only 5mv, a very noticeable level of distortion will be introduced during this part of a recording session. The sampling rate plays a different role in determining the quality of digital sound reproduction. One classic law in digital signal processing was published by Harry Nyquist in 1993. He determined that to accurately reproduce a signal of frequency f, the sampling rate has to be greater than 2*f. This is commonly called the Nyquist Rate. The audio signal in Figure 10.7 is being measured at a considerably slower rate than that shown in the previous examples, with noticeably negative consequences. At several places in the waveform it is not even sampled a single time during an excursion above or below the center line.

Figure 10.8 The waveform after playing back digitized samples. The human ear hears sound up to 20KHz, which implies that we need to sample audio to 40KHz or better to achieve good reproduction. In fact, the sampling rate used for digital reproduction of music via compact disk or digital audio tape is 44KHz, using sixteen-bit samples. The quality of sound achieved at this sampling rate is generally acknowledged to be superior. This does not mean that all digital recordings have to be done at 44KHz rates. Virtually every digital phone system in the world uses an 8KHz sampling rate to record human speech, with generally good results. This means that the phone system is unable to pass any signal with a frequency of 4KHz or higher. This clearly does not render the system uselessmillions of long-distance calls over digital lines are made every day. The average speech signal is composed of many different frequencies, and even if everything above 4KHz is discarded, most of the speech energy still makes it through the system. Our ears detect this loss as a lower-fidelity signal, but they still understand it quite well.

The ultimate test of all this is how the audio output sounds to our ears. It is difficult to quantify a quality of sound measurement is strictly mathematical terms, so when discussing audio output, it is always best to temper judgments with true listener trials.

PC-Based Sound
Some exotic work in digital signal processing has been going on for years, but it usually involved expensive special-purpose peripherals far out of reach of the average computer installation. Early desktop computers did not really push the state of the art in sound reproduction. Original IBM and Apple computers both had built-in speakers as standard equipment, but they gave the programmer only a single bit with which to control the speaker. This meant the speaker could generally be used only to emit beeps and buzzes, not true digitized sound. In the early 1980s, however, many computer manufacturers saw that a true digitized sound capability could be added to their computers at a relatively low cost. Apple was the most prominent manufacturer, adding an eight-bit DAC to the Macintosh, which opened the door to the use of true digitized audio. Most desktop computers today are IBM compatible ISA computers based on Intels 80x86 CPU chips. Unfortunately for sound enthusiasts, IBM has not yet elected to add sound capability to the PC, but third-party solutions are relatively inexpensive. The sound samples used in this book have been created and manipulated using the Sound Blaster card, manufactured by Creative Labs. But several other cards are on the market that can play digitized sound samples, and any of these can be used, provided file-format conversion utilities exist. The next generation of digitized sound on the desktop is now here. Many of todays consumer machines can digitize and playback 44 KHz sixteen-bit CD-quality sound data. Only a few

years ago, this capability seemed a bit unusual. The exotic black cube from NeXT Computer seemed to presage the future when it was first introduced, incorporating a digital signal-processor (DSP) chip as a co-processor; the intent was to offload work, such as manipulating digitized audio, from the main CPU. For a while, other manufacturers followed this design, for example, Apple with its AV line of Macintosh computers. Today, the vast majority of PC-compatible machines sold in retail consumer outlets come equipped with sound cards and CD-ROM drivesby one count, over 75%. More recently, it seems that the pendulum may shift back in the other direction, as a new largesse of processing power in the CPU will allow its deployment for audio and video processing, in addition to handling its regular duties. Intel is promulgating such a configuration with its new P6 processor, which has cycles to spare, that can be used for compressing and decompressing audio and video on the fly, even while crunching numbers in a spreadsheet. Regardless of how its done, the multimedia capabilities of todays machines only highlight the need for data compression, since they fill up a hard disk faster than ever before. The explosive growth of the Internet and the WorldWide Web, which allows multimedia-enriched distributed documents, also increases the need for compression, because the bandwidth of communications links is not increasing as fast as the processing power of the host computers. The files distributed with this book will be raw sound files. These will be pure binary recordings of eight-bit input data. Virtually all sound software on desktop machines today expects more than that for a sound file, but many software packages have utilities to convert raw sound files to a particular format. The Sound Blaster, for example, includes an executable program called VOC-HDR.EXE that prepends a header file to a raw sound file. The sound samples here were all sampled at 11KHz, a commonly used rate for medium-fidelity digital recording. By supplying sound data only, the code here can concentrate on compression, without worries about additional superfluous data in

the file. A full-fledged sound-file compression package by necessity needs to support the dozens of different file formats in existence, but that mostly consists of implementation details. Some sound capability resources are available for a relatively small investment. Many on-line services, such as Compuserve, America Online, GEnie, and BIX, have active forums for audio manipulation. There are also active forums on the Internet, such as Web sites and Usenet newsgroups, focusing on digital audio. Freeware and shareware utility programs available in these forums do a passable job of playing sound out of the PC speaker. Other programs convert sound files between various formats. It wouldnt be feasible to try to list specific examples here, but it should be relatively simple to find this type of software. In addition, third-party sound cards are available for a relatively low investment. Q.20 What is the Lossless Compression of Sound and Lossy Compression.
Ans.

allocated a 64Kbps slot on one of these channels. If it suddenly needed 100Kbps because the compression code hit a rough spot, there would be a major problem. These early researchers were attempting to divide a 64Kbps channel into two 32Kbps channels to get two for the price of one. This required compression techniques that would consistently compress data by 50 percent, even if it meant losing some resolution. Today, when trying to compress sound on disk for multimedia applications, we are in a slightly better position. We store and retrieve data from fixed disks, a more flexible medium for our work. If our files are compressed by 95 percent in some cases, and -10 percent in others, it will not really cause any trouble.

Problems and Results


How much can we compress voice files using conventional lossless techniques? To answer this question, a set of six short sound files were created, ranging in length from about one second to about seven seconds. To determine how compressible these files were, they were packed into an archive using ARJ 2.10, a shareware compression program that generally compresses as well or better than any other general-purpose program. ARJ results showed that voice files did in fact compress relatively well. The six sample raw sound files gave the following results: Filename SAMPLE-1.RAW SAMPLE-2.RAW SAMPLE-3.RAW SAMPLE-4.RAW Original 50777 12033 73091 23702 Compressed 33036 8796 59527 9418 Ratio 35% 27% 19% 60%

Lossless Compression of Sound


The original applications for sound compression could not take advantage of lossless data-compression techniques. One characteristic of all the compression techniques discussed so far in this book is that the amount of compression they achieve on a given data set is not known in advance. In some cases, the compression program can actually cause the data to expand, taking up more space than it occupied before. In the 1960s, telecommunications researchers were trying to find ways to put more conversions on digital trunk lines, particularly on expensive lines, such as undersea cables or satellite links. Unlike disk space, which is somewhat flexible, these links have a fixed total bandwidth. A single telephone conversion might be

SAMPLE-5.RAW SAMPLE-6.RAW

27411 15913

19037 12771

30% 20%

LZSS dictionary compression. Since LZSS is normally a much more powerful compression technique, this is a telling result. What LZSS takes advantage of when compressing is repeated strings of characters in the file. Order-0 Huffman coding just takes advantage of overall frequency differences for individual sequences. What we see in these sounds files is some overall frequency difference between the various codes that make up the files, but not as many repeated strings as we might normally expect. A look at snapshots of these sound files reveals some of the character of the data we are trying to compress. Figure 10.9 shows a section of about 600 sample points from SAMPLE-3.RAW. In this case, the sound samples are only taking up about 30 percent of the possible range allocated for them by the hardware. While individual samples can range from +127 to -128, in this snapshot they run only from about +30 to -30. By only using a portion of the available bandwidth, a sound file automatically makes itself a good candidate for Huffman compression. The sample shown in Figure 10.9 can probably be compressed by about 30 percent by just squeezing the samples down from eight bits to six or so bits. This is, in effect, what the Huffman coder does.

These compression results look relatively promising. All the files were compressible to some extent, and some were reduced to less than half their original size. This level of compression is undoubtedly useful and may well be enough for some applications. ARJ.EXE performs two sorts of compression on an input data stream. First, it does an LZSS type of windowed string matching on the string. The output from LZSS is, of course, a stream of tokens referring to either individual characters or matched strings. ARJ, like LHArc, takes LZSS a step further by performing Huffman compression on the output stream. Compressing these sound files using just LZSS compression and simple order-0 Huffman coding might tell us a little bit about what kind of redundancy is in these voice files. To check the results, the files were compressed again with the LZSS program from Chapter 8 and the HUFF program from chapter 3. The results of these experiments are shown in the following table. Filename SAMPLE-1.RAW SAMPLE-2.RAW SAMPLE-3.RAW SAMPLE-4.RAW SAMPLE-5.RAW SAMPLE-6.RAW ARJ Ratio 35% 27% 19% 60% 30% 20% LZSS Ratio 23% 5% 3% 25% 15% 2% HUFF Ratio 26% 30% 17% 27% 32% 18%

Figure 10.9 Sample points from SAMPLE-3.RAW. Looking for repeated sequences in a sample such as this is less fruitful. We can certainly see a pattern in the waveform, but it is somewhat irregular, and it is not likely to produce many repeated patterns of even length 2. If we keep sampling long enough, random chance dictates that repeated strings will recur, but the compression will be much less than in a data or program file.

The table shows that in every case, we perform more compression with simple order-0 Huffman coding than we do

Figure 10.10 shows a sound sample that is a much more difficult candidate for compression. Unlike Figure 10.9, this sound sample utilizes nearly the entire dynamic range of the ADC, so an order0 Huffman encoder will be much less effective. Likewise, the chances of finding repeated patterns with an LZSS algorithm diminish considerably here. This is the type of file that gives us only a few percentage points of compression.

so we are introducing a significant amount of distortion in the form quantization error. All these factors are taken into account when designing the hardware and software for digitization. Instead of trying to perfectly reproduce analog phenomena, we instead make compromises that give us reproduction that is satisfactory for our purposes. Likewise, when we look at lossy compression, we once again accept a certain loss in fidelity. The signal we get after going through a compression/expansion cycle will not be identical to the original, but we can adjust the parameters to get better or worse fidelity, and likewise better or worse compression. Lossy compression is not necessarily an end to itself. We frequently use lossy compression in a two-phase process: a lossy stage followed by a lossless stage. One nice thing about lossy compression is that it frequently smooths out the data, which makes it even more suitable for lossless compression. So we get an extra unexpected benefit from lossy compression, above and beyond the compression itself. Q 17 Explain the Silence Compression and Companding in Detail Ans . Silence compression on sound files is the equivalent of run-length encoding on normal data files. In this case, however, the runs we encode are sequences of relative silence in a sound file. This is a lossy technique because we replace the sequences of relative silence with absolute silence. Figure 10.11 shows a typical sound sample that has a long sequence of silence. The first two thirds of it is composed of silence. Note that though we call it silence, there are actually

Figure 10.10 A sound sample that is difficult to compress. Of course, even when looking at a busy sample like this, the human eye picks out patterns. The peaks and valleys of the waveform occur at somewhat regular intervals, telling us that sinusoidal waveforms are present in the signal. Unfortunately, our existing compression algorithms arent equipped to find this type of redundancy in an input waveform. To do better, we need to move to a new frontier: lossy compression.

Lossy Compression
The very word lossy implies that when using this type of compression we are going to give up a certain amount of precision. This would certainly not be acceptable when compressing the data or text files we use on our computers. We could probably compress the M&T Books annual financial statement if we rounded all figures off to the nearest million dollars, for example, but the accounting department would definitely have a problem working with the books after that. By digitizing sound samples, however, we have in effect given up quite a bit of precision. For example, our sound samples used in this chapter were all recorded at 11KHz. This means that we have thrown away the entire portion of every sample greater than 5.5KHz in frequency. We are also using only eight-bit samples,

very small blips in the waveform. These are normal background noise and can be considered inconsequential.

two, so we have to see four consecutive silence codes before we consider a run started. SILENCE.C by definition spends a lot of time looking ahead at upcoming input data. For example, to see if a silence run has really started the program must look at the next upcoming four input values. To simplify this, the program keeps a look-ahead buffer full of input data. It never directly examines the upcoming data read in via getc(). Instead, it looks at the bytes read into the buffer. This makes it easy to write functions to determine if a silence run has been started or if one is now over. Just how effective silence compression can be at compressing files is shown in the following table. As expected, files without much silence in them were not greatly affected. But files that contained significant gaps were compressed quite a bit. File Name SAMPLE-1.RAW SAMPLE-2.RAW SAMPLE-3.RAW SAMPLE-4.RAW SAMPLE-5.RAW Raw Size 50777 12033 73091 13852 27411 Compressed Size 37769 11657 73072 10962 22865 Compression 26% 3% 0% 21% 17%

Figure 10.11 A typical sound sample with a long sequence of silence. A compression program for a sample like this needs to work with a few parameters. First, it needs a threshold value for what can be considered silence. With our eight-bit samples, for example, 80H is considered pure silence. We might want to consider any sample value within a range of plus or minus three from 80H to be silence. Second, it needs a way to encode a run of silence. The sample program that follows creates a special SILENCE_CODE with a value of FF used to encode silence. The SILENCE_CODE is followed by a single byte that indicates how many consecutive silence codes there are. Third, it needs a parameter that gives a threshold for recognizing the start of a run of silence. We wouldnt want to start encoding silence after seeing just a single byte of silence. It doesnt even become economical until three bytes of silence are seen. We may want to experiment with even higher values than three to see how it affects the fidelity of the recording. Finally, we need another parameter that indicates how many consecutive non-silence codes need to be seen in the input stream before we declare the silence run to be over. Setting this parameter to a value greater than one filters out anomalous spikes in the input data. This can also cut back on noise in the recording. The code to implement this silence compression follows. It incorporates a starting threshold of four and a stop threshold of

The final question to ask about silence detection is how it affects the fidelity of input files. The best way to answer that is to take the sample files, compress them, then expand them into new files. The expanded files should differ only from the originals in that strings of characters near the silence value of 80H should all have been arbitrarily made exactly 80H, changing slightly noisy silence to pure silence. In most cases, it is not possible to tell the sound samples that have been through a compression/expansion cycle from the originals. By tinkering with the parameters, it is possible to start erasing

significant sections of speech, but that obviously means the parameters are not set correctly. All in all, when applied correctly, silence compression provides an excellent way to squeeze redundancy out of sound files.

thirteen-bit ADC might be giving 200 microvolt resolution, which turns out to be more than is necessary. The telecommunications industry solved this using a non-linear matched set of ADCs and DACs. The normal ADC equipment used in desktop computers (and most electronic equipment) uses a linear conversion scheme in which each increase in a code value corresponds to a uniform increase in input/output voltage. This arrangement is shown in Figure 10.12.

Companding
Silence compression can be a good way to remove redundant information from sound files, but in some cases it may be ineffective. In the preceding examples, SAMPLE-3.RAW had so few silent samples it was only reduced by a few bytes out of 73K. This situation is somewhat analogous to using run-length encoding on standard text or data files: it will sometimes produce great gains, but it is not particularly reliable. In the early 1960s, telecommunications researchers were looking for a method of data compression that could always reduce the number of bits in a sound sample. Customer satisfaction tests showed that it took about thirteen bits of resolution in the DAC sampled at 8,000Hz to provide an acceptable voice connection, but it seemed likely that much of that resolution was going to waste. We need thirteen bits of resolution in a phone conversion because of the large dynamic range of the human voice. To accommodate a loud speaker, the voltage input range of the DAC has to be set at a fairly high level. The problem is that the input voltage from a very soft voice is several orders of magnitude lower than this. If the ADC had eight bits of resolution, it would only detect input signals close to 1 percent of the magnitude of the highest input. This proved unacceptable. It turns out, however, that the thirteen bits of resolution needed to pick up the voice of the quietest speaker is overkill for resolution of the loudest speaker. If our microphone input for a loud speaker is in the neighborhood of 100mv, we might only need one millivolt of resolution to provide good sound reproduction. The

Figure 10.12 A linear conversion scheme in which each increase in a code value corresponds to a uniform increase in input/output voltage. Using a linear conversion scheme such as this, when we go from code 0 to code 1, the output voltage from the DAC might change from 0mv to 1mv. Likewise, going from code 100 to code 101 will change the DAC output voltage from 100mv to 101mv. The system in our telecommunications equipment today uses a companding codecjargon for compressing/expanding coder/decoder. The codec is essentially a chip that combines several functions, including those of the DAC, ADC, and input and output filters. We are concerned with the DAC and ADC. The codec used in virtually all modern digital telephone equipment does not use a standard linear function when converting codes to voltages and voltages to codes. Instead, it uses an exponential function that changes the size of the voltage step between codes as the codes grow larger. Figure 10.13 shows an example of what this curve looks like. The resolution for smaller code values is much finer than at the extremes of the range. For example, the difference between a code of zero and a code of one might be 1mv, while the difference between code 100 and code 101 could be 10mv.

ut Code utput Value Figure 10.13 An exponential function that changes the size of voltage steps. The exponential curve defined for telecommunications codecs gives an effective range of thirteen bits out of a codec that only uses eight-bit samples. We can do the same thing with out eightbit sound files by squeezing eight-bit samples into a smaller number of codes. Our eight-bit sound files can be considered approximately sevenbit samples with a single sign bit, indicating whether the output voltage is positive or negative. This gives us a range running from zero to 128 to encode for the output of our non-linear compression function. If we assume that we will have N codes to express the range of zero to 127, we can develop a transfer function for each code using the following equation:
output = 127.0 * ( pow( 2.0, code / N ) - 1.0 )
0 1 2 3 4 5 6 0 13 28 44 62 81 103

As was mentioned before, lossy compression is frequently used as a front end to a lossless compressor. In the case of COMPAND.C, this is a very effective strategy. After the files have been processed, far fewer codes are present in the output file, which makes string matching more likely, such as that used by LZSS compressors. By compressing a file by 50 percent using the companding strategy, then by applying LZSS compression, we can frequently achieve upwards of 90 percent compression on sound samples.

Other Techniques
This chapter covered some of the simpler techniques used to compress sound samples. As the level of processing power available for processing goes up, far more complicated algorithms are being applied. One of the most common compression algorithms in use today has been sanctioned by the CCITT in their recommendation G.721. The G.721 algorithm uses Adaptive Differential Pulse Code Modulation (ADPCM) to encode digital signals at 16Kbps or 32Kbps. This algorithm is commonly performed by digital signal processors, and it is generally applied to data that has already been digitized using standard codes.

In other words, we calculate the output by raising 2 to the code/N power. The value of code/N will range from zero for code 0 up to one for code N, resulting in an output range that runs from zero to 127, with a decidedly non-linear look. An example of how this might work would be found if we used eight samples to encode the range zero to 128. This, in effect, compresses seven bits to three. The output value produced by an input code is shown in the table that follows. Transforming Inp three bits to seven O

The ADPCM algorithm combines two techniques. The first, delta pulse code modulation, encodes sound signals by measuring the difference between two consecutive samples, not their absolute values. The quantization level adapts itself to the changing input signals, so the size of the encoded voltage changes as the input signal changes. When the signal moves from a high voltage to a low voltage at a step rate, the encoded step value will be high. If a quiet input signal is being encoded, the step value will be low. This becomes complicated because the ADPCM algorithm requires that the transmitter predict in advance where the input signal is headed. If this prediction is not made accurately, it is not possible to make good judgments about the size of the step defined by each code. The process of predicting where a waveform is headed occupies most of the processors time. To compress sound samples to even lower bit rates, even more sophisticated techniques, such as Linear Predictive Coding (LPC), are used. Human speech can be compressed and replayed in a recognizable state with rates as low as 2,400 bits per second using LPC. LPC attempts to compress human speech by modeling the vocal tract that produces the speech. Instead of storing thousands of samples per second. LPC instead attempts to determine just a few parameters that model the process used to create the sound. The success or failure of LPC hinges on the ability of the compressor to execute millions of instructions per second during the compression process. Processes such as LPC and ADPCM represent the type of algorithms that will be used more and more frequently on the desktop. Unfortunately, the complexity of these algorithms are far beyond the scope of a sample program in this chapter.

Q.21 Explain Image file formats.


Ans.

Image file formats


Image file formats are standardized means of organizing and storing digital images. Image files are composed of either pixel or vector (geometric) data that are rasterized to pixels when displayed (with few exceptions) in a vector graphic display. The pixels that constitute an image are ordered as a grid (columns and rows); each pixel consists of numbers representing magnitudes of brightness and color.

Image file sizes


Image file sizeexpressed as the number of bytesincreases with the number of pixels composing an image, and the colour depth of the pixels. The greater the number of rows and columns, the greater the image resolution, and the larger the file. Also, each pixel of an image increases in size when its colour depth increases an 8-bit pixel (1 byte) stores 256 colors, a 24-bit pixel (3 bytes) stores 16 million colors, the latter known as truecolor. Image compression uses algorithms to decrease the size of a file. High resolution cameras produce large image files, ranging from hundreds of kilobytes to megabytes, per the camera's resolution and the image-storage format capacity. High resolution digital cameras record 12 megapixel (1MP = 1,000,000 pixels / 1 million) images, or more, in truecolor. For example, an image recorded by a 12 MP camera; since each pixel uses 3 bytes to record truecolor, the uncompressed image would occupy 36,000,000 bytes of memorya great amount of digital storage for one image, given that cameras must record and store many images to be practical. Faced with large file sizes, both within the camera and a storage

disc, image file formats were developed to store such large images. An overview of the major graphic file formats follows below.

Image file compression


There are two types of image file compression algorithms: lossless and lossy. Lossless compression algorithms reduce file size without losing image quality, though they are not compressed into as small a file as a lossy compression file. When image quality is valued above file size, lossless algorithms are typically chosen. Lossy compression algorithms take advantage of the inherent limitations of the human eye and discard invisible information. Most lossy compression algorithms allow for variable quality levels (compression) and as these levels are increased, file size is reduced. At the highest compression levels, image deterioration becomes noticeable as "compression artifacting". The images below demonstrate the noticeable artifacting of lossy compression algorithms; select the thumbnail image to view the full size version.[1]

format. Most Windows applications open metafiles and then save them in their own native format. Page description language refers to formats used to describe the layout of a printed page containing text, objects and images. Examples are PostScript, PDF and PCL.

Raster formats
These formats store images as bitmaps (also known as pixmaps). For a description of the technology aside from the format, see Raster graphics.

JPEG/JFIF
JPEG (Joint Photographic Experts Group) is a compression method; JPEG-compressed images are usually stored in the JFIF (JPEG File Interchange Format) file format. JPEG compression is (in most cases) lossy compression. The JPEG/JFIF filename extension in DOS is JPG (other operating systems may use JPEG). Nearly every digital camera can save images in the JPEG/JFIF format, which supports 8 bits per color (red, green, blue) for a 24-bit total, producing relatively small files. When not too great, the compression does not noticeably detract from the image's quality, but JPEG files suffer generational degradation when repeatedly edited and saved. The JPEG/JFIF format also is used as the image compression algorithm in many Adobe PDF files.

Major graphic file formats


Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF formats are most often used to display images on the Internet. These graphic formats are listed and briefly described below, separated into the two main families of graphics: raster and vector. In addition to straight image formats, Metafile formats are portable formats which can include both raster and vector information. Examples are application-independent formats such as WMF and EMF. The metafile format is an intermediate

JPEG 2000
JPEG 2000 is a compression standard enabling both lossless and lossy storage. The compression methods used are different from the ones in standard JFIF/JPEG; they improve quality and compression ratios, but also require more computational power to process. JPEG 2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it is used currently in

professional movie editing and distribution (e.g., some digital cinemas use JPEG 2000 for individual movie frames).

RAW
RAW refers to a family of raw image formats that are options available on some digital cameras. These formats usually use a lossless or nearly-lossless compression, and produce file sizes much smaller than the TIFF formats of full-size processed images from the same cameras. Although there is a standard raw image format, (ISO 12234-2, TIFF/EP), the raw formats used by most cameras are not standardized or documented, and differ among camera manufacturers. Many graphic programs and image editors may not accept some or all of them, and some older ones have been effectively orphaned already. Adobe's Digital Negative (DNG) specification is an attempt at standardizing a raw image format to be used by cameras, or for archival storage of image data converted from undocumented raw image formats, and is used by several niche and minority camera manufacturers including Pentax, Leica, and Samsung. The raw image formats of more than 230 camera models, including those from manufacturers with the largest market shares such as Canon, Nikon, Sony, and Olympus, can be converted to DNG.[2] DNG was based on ISO 12234-2, TIFF/EP, and ISO's revision of TIFF/EP is reported to be adding Adobe's modifications and developments made for DNG into profile 2 of the new version of the standard. As far as videocameras are concerned, ARRI's Arriflex D-20 and D-21 cameras provide raw 3K-resolution sensor data with Bayern pattern as still images (one per frame) in a proprietary format (.ari file extension). Red Digital Cinema Camera Company, with its Mysterium sensor family of still and video cameras, uses its proprietary raw format called REDCODE (.R3D extension), which stores still as well as audio+video information in one lossycompressed file.

Exif
The Exif (Exchangeable image file format) format is a file standard similar to the JFIF format with TIFF extensions; it is incorporated in the JPEG-writing software used in most cameras. Its purpose is to record and to standardize the exchange of images with image metadata between digital cameras and editing and viewing software. The metadata are recorded for individual images and include such things as camera settings, time and date, shutter speed, exposure, image size, compression, name of camera, color information, etc. When images are viewed or edited by image editing software, all of this image information can be displayed.

TIFF
The TIFF (Tagged Image File Format) format is a flexible format that normally saves 8 bits or 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, usually using either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be lossy and lossless; some offer relatively good lossless compression for bilevel (black&white) images. Some digital cameras can save in TIFF format, using the LZW compression algorithm for lossless storage. TIFF image format is not widely supported by web browsers. TIFF remains widely accepted as a photograph file standard in the printing business. TIFF can handle device-specific color spaces, such as the CMYK defined by a particular set of printing press inks. OCR (Optical Character Recognition) software packages commonly generate some (often monochromatic) form of TIFF image for scanned text pages.

PNG
The PNG, (Portable Network Graphics) file format was created as the free, open-source successor to the GIF. The PNG file format supports truecolor (16 million colors) while the GIF supports only 256 colors. The PNG file excels when the image has large, uniformly colored areas. The lossless PNG format is best suited for editing pictures, and the lossy formats, like JPG, are best for the final distribution of photographic images, because in this case JPG files are usually smaller than PNG files. Many older browsers currently do not support the PNG file format; however, with Mozilla Firefox or Internet Explorer 7, all contemporary web browsers now support all common uses of the PNG format, including full 8-bit translucency (Internet Explorer 7 may display odd colors on translucent images ONLY when combined with IE's opacity filter). The Adam7-interlacing allows an early preview, even when only a small percentage of the image data has been transmitted. PNG provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-color, grayscale, and truecolor images are supported, plus an optional alpha channel. PNG is designed to work well in online viewing applications, such as the World Wide Web, so it is fully streamable with a progressive display option. PNG is robust, providing both full file integrity checking and simple detection of common transmission errors. Also, PNG can store gamma and chromaticity data for improved color matching on heterogeneous platforms. Some programs do not handle PNG gamma correctly, which can cause the images to be saved or displayed darker than they should be.[3]

Animated formats derived from PNG are MNG and APNG. The latter is supported by Firefox and Opera and is backwards compatible with PNG.

GIF
GIF (Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is still widely used to provide image animation effects. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.

BMP
The BMP file format (Windows bitmap) handles graphics files within the Microsoft Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage is their simplicity and wide acceptance in Windows programs.

PPM, PGM, PBM, PNM


Netpbm format is a family including the portable pixmap file format (PPM), the portable graymap file format (PGM) and the portable bitmap file format (PBM). These are either pure ASCII files or raw binary files with an ASCII header that provide very basic functionality and serve as a lowest-common-denominator for converting pixmap, graymap, or bitmap files between different platforms. Several applications refer to them collectively as PNM format (Portable Any Map).

WEBP
WebP is a new image format that uses lossy compression. It was designed by Google to reduce image file size to speed up web page loading: its principal purpose is to supersede JPEG as the primary format for photographs on the web. WebP is based on VP8's intra-frame coding and uses a container based on RIFF.

image formats contain a geometric description which can be rendered smoothly at any desired display size. Vector file formats can contain bitmap data as well. 3D graphic file formats are technically vector formats with pixel data texture mapping on the surface of a vector virtual object, warped to match the angle of the viewing perspective. At some point, all vector graphics must be rasterized in order to be displayed on digital monitors. However, vector images can be displayed with analog CRT technology such as that used in some electronic test equipment, medical monitors, radar displays, laser shows and early video games. Plotters are printers that use vector data rather than pixel data to draw graphics.

Others
Other image file formats of raster type include:

JPEG XR (New JPEG standard based on Microsoft HD Photo) TGA (TARGA) ILBM (InterLeaved BitMap) PCX (Personal Computer eXchange) ECW (Enhanced Compression Wavelet) IMG (ERDAS IMAGINE Image) SID (multiresolution seamless image database, MrSID) CD5 (Chasys Draw Image) FITS (Flexible Image Transport System) PGF (Progressive Graphics File) XCF (eXperimental Computing Facility format, native GIMP format) PSD (Adobe PhotoShop Document) PSP (Corel Paint Shop Pro)

CGM
CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics, and text, and is defined by ISO/IEC 8632. All graphical elements can be specified in a textual source file that can be compiled into a binary file or one of two text representations. CGM provides a means of graphics data interchange for computer representation of 2D graphical information independent from any particular application, system, platform, or device. It has been adopted to some extent in the areas of technical illustration and professional design, but has largely been superseded by formats such as SVG and DXF.

SVG
SVG (Scalable Vector Graphics) is an open standard created and developed by the World Wide Web Consortium to address the need (and attempts of several corporations) for a versatile, scriptable and all-purpose vector format for the web and otherwise. The SVG format does not have a compression scheme of its own, but due to the textual nature of XML, an SVG graphic can be

Vector formats
See also: Vector graphics As opposed to the raster image formats above (where the data describes the characteristics of each individual pixel), vector

compressed using a program such as gzip. Because of its scripting potential, SVG is a key component in web applications: interactive web pages that look and act like applications.

MPO
Also known as a Multi-Picture Object or Multi-Picture Format, the MPO file format was first used in the FinePix REAL 3D W1 camera, made by FujiFilm. The format is proposed as an open standard by CIPA (Camera & Imaging Products Association) as CIPA DC-007-2009. It contains multiple JPEG images with respective thumbnails and metadata.

Others
Other image file formats of vector type include:

AI (Adobe Illustrator) CDR (CorelDRAW) EPS (Encapsulated PostScript) ODG (OpenDocument Graphics) PDF (Portable Document Format) PGML (Precision Graphics Markup Language) SWF (shockwave Flash) VML (Vector Markup Language) WMF / EMF (Windows Metafile / Enhanced Metafile) XAR ([Xara X, Xara Xtreme, Xara Photo & Graphic Designer, Xara Designer Pro]) XPS (XML Paper Specification)

Q.22 What do mean by Multiple monitors in multimedia system.

Ans Multiple monitors


Multi head Video Cards - Getting started with multiple monitors is fast and easy! Most operating systems out today support multiple monitors, so you can simply purchase additional screens along with multi-head video cards to drive them and you're ready to go. Alternative solutions to multiple stand-alone monitors include the option of maximizing your space with arm supports or taking advantage of leading-edge design and power in today's multiple monitor systems that provide for greater image continuity and in certain cases a robust multi-media system. Click on diagram images for details.

3D Formats

PNS
The PNG Stereo (.pns) format consists of a side-by-side image based on PNG (Portable Network Graphics).

JPS
The JPEG Stereo (.jps) format consists of a side-by-side image format based on JPEG.

Multiple head video cards come in 2 head and 4 head versions. This means that a single video card has either 2 or 4 outputs which feed out to 2 or 4 screens. Basically the video card takes the single image being generated by the software progran and devides the image into 2 or for parts. For example: In windows, a dual head video card takes the desktop image and streches the desktop into two monitors. Once the user has determined if the two monitors are side by side or one on top of the other, the user can move applications around within the two monitors as if the two screens are one large monitor. Multiple monitor video cards vary dramatically in performance just as they do in the single output variety. Which card you select will be determined by what you wish to do with system ie: flight simultation, 2D graphics, 3D Graphics, or motion video. Each has its own level of demand of the video card. To help you get a grasp on which video card is best for you, visit our Multiple Monitor video card section to learn more. Multiple Monitor computing is now supported by every major operating software developing in the personal computing industry. Each of the major forces behing graphic computing including Apple, Microsoft, Sun, Linux, and Unix each offer multiple monitor support ranging from 10 monitors to 15 monitors with their standard products. to learn more about Operating systems that support multiple monitor computing visit our Operating System page in the software section of this site. Multiple monitor computing is driven by the operating software, the video card and in the case of games and motion video by the software program. Regardless of whether your computer is a Dell, Gateway, IBM, HP, Sun, Macintosh, Micron or just a bunch of components thrown together, as long as you are running an operating system and video card that supports multiple monitors you will have no problem getting up and going. Basically any computer made in the last 4 years should be multiple monitor ready. All you need is video power and screens.

The number of screens possible is limited by two factors. The first is the operating system. For example, Windows 98 supports up to 9 monitors, and now XP supports up to 10. The other limiting factor in multiple monitor computing is the number of slots available on the computer bus. For example, if you are using quad head video cards (video cards with 4 outputs to support 4 monitors) the maximum number would be 4 times the number of slots you have available for video cards. You can use any type of screen in such a multiple monitor system, just as you would with a single monitor system. You can run 4 LCD flat panels, 4 CRT monitors, and even 4 large screen TVs with just one video card or adaptor. In addition you can configure the monitors in single row, dual row, or however you want. Which portion of the overal image each monitor displays is determined by the operating software or the video card software. For instance, by entering into 'control panel' (Windows) and clicking on Display you can move each monitor around so that screen 1 moves from the far left to the far right and so on.

CPUs and multiple monitors - we will be short and to the point. Any computer running any of the latest operating software is going to support multiple monitors. What you do need to worry about however is the speed at which your computer operates. Running lots of programs on multiple screens is taxing on your CPU, especially if you are running motion video. If you are running programs like Excel, Illustrator, Word, and so on don't worry about it. If on the other hand you are running video editing software where you are running several video files at a time with sound then get all the CPU power you can get because you are going to need it.

Run-length encoding used as default method in PCX and as one of possible in BMP,TGA, TIFF
in

DPCM and Predictive Coding Entropy encoding


Adaptive dictionary algorithms such as

LZW

used

GIF and TIFF Deflation used in PNG, MNG, and TIFF Chain codes

Q.23 Explain the Image compression and JPEG with detail.

Methods for lossy compression: Reducing the

Ans. Image compression


Lossy and lossless Image compression
Image compression may be technical drawings,

color space to the most common colors in the

image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with

dithering to avoid posterization. Chroma subsampling. This takes advantage of the fact that

lossy or lossless. Lossless compression

is preferred for archival purposes and often for medical imaging,

the human eye perceives spatial changes of brightness more sharply than those of color, by averaging or dropping some of the chrominance information in the image.

clip art, or comics. This is because lossy compression methods, especially when used at low bit rates, introducecompression artifacts. Lossy methods are especially
suitable for natural images such as photographs in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. The lossy compression that produces imperceptible differences may be called

Transform coding. This is the most commonly used method. A Fourier-related transform such as DCT or the wavelet transform are applied, followed by quantization and entropy coding.

visually lossless.

Fractal compression.

Methods for lossless image compression are:

A Standard That Works: JPEG


In the late 1970s and early 1980s, research began on new types of image compression that promised to greatly outperform the more conventional compression techniques discussed earlier. By the late 1980s, this work was beginning to find commercial applications for image processing on desktop systems, mostly in the form of add-on coprocessor cards for UNIX and Macintosh workstations. These cards were able to perform lossy compression on images at ratios of as much as 95 percent without visible degradation of the image quality. Other forces at this time combined to start development of an international standard that would encompass these new varieties of compression. There are clear advantages to all parties if standards allowed for easy interchange of graphical formats. The main concern regarding early standardization is the possibility that it would constrain further innovation. The two standardization groups involved, the CCITT and the ISO, worked actively to get input from both industry and academic groups concerned with image compression, and they seem to have avoided the potentially negative consequences of their actions. The standards group created by these two organizations is the Joint Photographic Experts Group (JPEG). The JPEG standard was developed over the curse of several years, and is now firmly entrenched as the leading format for lossy graphics compression. The JPEG specification consists of several parts, including a specification for both lossless and lossy encoding. The lossless compression uses the predictive/adaptive model described earlier in this chapter, with a Huffman code output stage, which produces good compression of images without the loss of any resolution. The most interesting part of the JPEG specification is its work on a lossy compression technique. The rest of this chapter discusses

the basics of this technique, with sample code to illustrate its components.

JPEG Compression
The JPEG lossy compression algorithm operates in three successive stages. These three steps combine to form a powerful compressor, capable of compressing continuous tone images to less than 10 percent of their original size, while losing little, if any, of their original fidelity.

The Discrete Cosine Transform


The key to the compression process discussed here is a mathematical transformation known as the Discrete Cosine Transform (DCT). The DCT is in a class of mathematical operations that includes the well-known Fast Fourier Transform (FFT), as well as many others. The basic operation performed by these transforms is to take a signal and transform it from one type of representation to another. This transformation is done frequently when analyzing digital audio samples using the FFT. When we collect a set of sample points from an incoming audio signal, we end up with the representation of a signal in the time domain. That is, we have a collection of points that show what the voltage level was for the input signal at each point in time. The FFT transforms the set of sample points into a set of frequency values that describes exactly the same signal. Figure 11.4 shows the classic time domain representation of an analog signal. This particular signal is composed of three different sine waves added together to form a single, slightly more complicated waveform. Each of the sample points represents the

relative voltage or amplitude of the signal at a specific point in time.

Figure 11.4 The classic time domain representation of an analog signal. Figure 11.5 shows what happens to the same set of data points after FFT processing. In the time-domain representation of the signal, each of the points on the X axis represents a different point in time, and each of the points on the Y axis represents a specific magnitude of the signal. After processing the data points with an FFT, the X axis no longer has the same meaning. Now, each point on the X axis represents a specific frequency, and the Y axis represents the magnitude of that frequency.

The DCT is closely related to the Fourier Transform, and produces a similar result. It takes a set of points from the spatial domain and transforms them into an identical representation in the frequency domain; however, we are going to introduce an additional complication in this particular instance. Instead of a twodimensional signal plotted on an X and Y axis, the DCT will operate on a three-dimensional signal plotted on an X, Y, and Z axis. In this case, the signal is a graphical image. The X and Y axes are the two dimensions of the screen. The amplitude of the signal in this case is simply the value of a pixel at a particular point on the screen. For the examples used in this chapter, that is an eightbit value used to represent a grey-scale value. So a graphical image displayed on the screen can be thought of as a complex threedimensional signal, with the value on the Z axis denoted by the color on the screen at a given point. This is the spatial representation of the signal. The DCT can be used to convert spatial information into frequency or spectral information, with the X and Y axes representing frequencies of the signal in two different dimensions. And like the FFT, there is an Inverse DCT (IDCT) function that can convert the spectral representation of the signal back to a spatial one.

Figure 11.5 Data points after FFT processing. Given that interpretation of the output of the FFT, Figure 11.5 makes immediate sense. It says that the signal displayed in the earlier figure can also be represented as the sum of three different frequencies of what appears to be identical magnitude. Given this information, it should be just as easy to construct the signal as it would be with Figure 11.4. Another important point to make about the this type of transformation function is that the function is reversible. In principle, the same set of points shown in Figure 11.5 can be processed through an inverse FFT function, and the points shown in Figure 11.4 should result. The two transformation cycles are essentially lossless, except for loss of precision resulting from rounding and truncation errors.

Q.24 What is JPEG compression? How JPEG is work? What are the advantages and disadvantages? Where
is JPEG compression used?

Ans. JPEG compression


JPEG stands for Joint Photographic Experts Group, which is a standardization committee. It also stands for the compression algorithm that was invented by this committee.

system, the algorithm retains more of the luminance in the compressed file. The next step in the compression process is to apply a Discrete Cosine Transform (DCT) for the entire block. DCT is a complex process that is let loose on each individual pixel. It replaces actual color data for each pixel for values that are relative to the average of the entire matrix that is being analysed. This operation does not compress the file, it simply replaces 88 pixel values by an 88 matrix of DCT coefficients. Once this is done, the actual compression can start. First the compression software looks at the JPEG image quality the user requested (e.g. Photoshop settings like low quality, medium quality,) and calculates two tables of quantization constants, one for luminance and one for chrominance. Once these tables have been constructed, the constants from the two tables are used to quantize the DCT coefficients. Each DCT coefficient is divided by its corresponding constant in the quantization table and rounded off to the nearest integer. The result of quantizing the DCT coefficients is that smaller, unimportant coefficients will be replaced by zeros and larger coefficients will lose precision. It is this rounding-off that causes a loss in image quality. The resulting data are a list of streamlined DCT coefficients. The last step in the process is to compress these coefficients using either a Huffman or arithmetic encoding scheme. Usually Huffman encoding is used. This is a second (lossless) compression that is applied. Advantages By putting 2 compression algorithms on top of each other, JPEG achieves remarkable compression ratios. Even for prepress use, you can easily compress a file to one fifth of its original size. For web publishing or e-mail exchange, even better ratios up to 20-to-1 can be achieved. JPEG decompression is supported in PostScript level 2 and 3 RIPs. This means that smaller files can be sent across the network to the

There are two JPEG compression algorithms: the oldest one is simply referred to as JPEG within this page. The newer JPEG 2000 algorithm is discussed on aseparate page. Please note that you have to make a distinction between the JPEG compression algorithm, which is discussed on this page, and the correspondingJFIF file format, which many people refer to as JPEG files and which is discussed in the file format section. JPEG is a lossy compression algorithm that has been conceived to reduce the file size of natural, photographic-like true-color images as much as possible without affecting the quality of the image as experienced by the human sensory engine. We perceive small changes in brightness more readily than we do small changes in color. It is this aspect of our perception that JPEG compression exploits in an effort to reduce the file size

How JPEG works


The JPEG algorithms performs its compression in four phases: First, the JPEG algorithms first cuts up an image in separate blocks of 88 pixels. The compression algorithm is calculated for each separate block, which explains why these blocks or groups of blocks become visible when too much compression is applied. Humans are more sensitive to changes in hue (chrominance) than changes in brightness (luminance). The JPEG algorithm is based on this difference in perception. It does not analyse RGB or CMYK color values but instead the image data are first converted to a luminance/chrominance color space, such as YUV. This allows for separate compression of these two factors. Since luminance is more important than chrominance for our visual

RIP which frees the sending station faster, minimizes overhead on the print server and speeds up the RIP. Disadvantages The downside of JPEG compression is that the algorithm is only designed for continuous tone images (remember that the P in JPEG stands for Photographic). JPEG not does not lend itself for images with sharp changes in tone. There are some typical types of images where JPEG should be avoided: images that have had a mask and shadow effect added to them in applications like Photoshop. screendumps or diagrams. blends created in Photoshop. images containing 256 (or less) colors. images generated by CAD-CAM software or 3D applications like Maya or Bryce. images that lack one or more of the process colors. Sometimes images are created that use for instance only the magenta and black plate. If such an image is compressed using JPEG compression, you may see artefacts show up on the cyan and yellow plate. Because of its lossy nature, JPEG should only be used during the production stage of prepress (making PostScript or PDF, imposing, proofing, outputting). During the create process when images are still edited, cropped and colour corrected, each new SAVE-command leads to extra loss of image quality when JPEG is used. I have seen older level 2 RIPs stumble over JPEG decompression from time to time. Simply resending the file solved the problem. Where is JPEG compression used JPEG compression can be used in a variety of file formats:

EPS-files EPS DCS-files JFIF-files PDF-files Q.25 What is JPEG compression? What is the JPEG Standard? Explain the DCT and quantization process ? Where is JPEG compression used? Ans JPEG Compression In computing, JPEG (pronounced /dep/ JAY-peg) is a commonly used method of lossy compression for digital photography (image). The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. JPEG compression is used in a number of image file formats. JPEG/Exif is the most common image format used by digital cameras and other photographic image capture devices; along with JPEG/JFIF, it is the most common format for storing and transmitting photographic images on the World Wide Web. [citation needed] These format variations are often not distinguished, and are simply called JPEG. The term "JPEG" is an acronym for the Joint Photographic Experts Group which created the standard. The MIME media type for JPEG is image/jpeg (defined in RFC 1341) The JPEG standard The name "JPEG" stands for Joint Photographic Experts Group, the name of the committee that created the JPEG standard and also other standards. It is one of two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) - titled as Coding of still pictures.

[1][2][3] The group was organized in 1986,[4] issuing the first JPEG standard in 1992, which was approved in September 1992 as ITU-T Recommendation T.81[5] and in 1994 as ISO/IEC 10918-1. The JPEG standard specifies the codec, which defines how an image is compressed into a stream of bytes and decompressed back into an image, but not the file format used to contain that stream.[6] The Exif and JFIF standards define the commonly used formats for interchange of JPEG-compressed images. JPEG standards are formally named as Information technology Digital compression and coding of continuous-tone still images. ISO/IEC 10918 consists of the following parts: Discrete cosine transform

Before computing the DCT of the 88 block, its values are shifted from a positive range to one centered around zero. For an 8-bit image, each entry in the original block falls in the range [0,255]. The mid-point of the range (in this case, the value 128) is subtracted from each entry to produce a data range that is centered around zero, so that the modified range is [ 128,127]. This step reduces the dynamic range requirements in the DCT processing stage that follows. (Aside from the difference in dynamic range within the DCT stage, this step is mathematically equivalent to subtracting 1024 from the DC coefficient after performing the transform - which may be a better way to perform the operation on some architectures since it involves performing only one subtraction rather than 64 of them.) This step results in the following values:

The 88 sub-image shown in 8-bit grayscale Next, each 88 block of each component (Y, Cb, Cr) is converted to a frequency-domainrepresentation, using a normalized, twodimensional type-II discrete cosine transform (DCT). As an example, one such 88 8-bit subimage might be:

he next step is to take the two-dimensional DCT, which is given by:

is the pixel value at coordinates is the DCT coefficient at coordinates If we perform this transformation on our matrix above, we get the following (rounded to the nearest two digits beyond the decimal point):

The DCT transforms an 88 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis functions, and the output values are referred to astransform coefficients. The horizontal index isu and the vertical index is v.

where is the integers horizontal spatial . frequency, for the .

Note the top-left corner entry with the rather large magnitude. This is the DC coefficient. The remaining 63 coefficients are called the AC coefficients. The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result, as may be seen above. The quantization step to follow accentuates this effect while simultaneously reducing the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage. The DCT temporarily increases the bit-depth of the data, since the DCT coefficients of an 8-bit/component image take up to 11 or more bits (depending on fidelity of the DCT calculation) to store. This may force the codec to temporarily use 16-bit bins to hold these coefficients, doubling the size of the image representation at this point; they are typically reduced back to 8-bit values by the quantization step. The temporary increase in size at this stage is not a performance concern for most JPEG implementations, because typically only a very small part of the image is stored in full DCT

is the vertical spatial frequency, for the integers is a normalizing function

form at any given time during the image encoding or decoding process. Quantization The human eye is good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency brightness variation. This allows one to greatly reduce the amount of information in the high frequency components. This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. This rounding operation is the only lossy operation in the whole process if the DCT computation is performed with sufficiently high precision. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers, which take many fewer bits to represent. A typical quantization matrix, as specified in the original JPEG Standard, is as follows:

where G is the unquantized DCT coefficients; Q is the quantization matrix above; and B is the quantized DCT coefficients. Using this quantization matrix with the DCT coefficient matrix from above results in:

For example, using 415 (the DC coefficient) and rounding to the nearest integer

Coding The final step in the JPEG process is coding the quantized images. The JPEG coding phase combines three different steps to compress the image. The first changes the DC coefficient at 0,0 from an absolute value to a relative value. Since adjacent blocks in an image exhibit a high degree of correlation, coding the DC element as the difference from the previous DC element typically produces a very small number. Next, the coefficients of the image are arranged in the zig-zag sequence. Then they are encoded using two different mechanisms. The first is run-length encoding of zero values. The second is what JPEG calls Entropy Coding. This involves sending out the coefficient codes, using either Huffman codes or arithmetic coding depending on the choice of the implementer. Q. 26 Explain the Zig-Zag Scanning pattern and Zig-Zag sequence.

The quantized DCT coefficients are computed with

Ans. Zig-Zag Scanning Patterns The zig-zag scanning pattern for run-length coding of the quantized DCT coefficients was established in the original MPEG standard. The same pattern is used for luminance and for chrominance. A modified (alternate) pattern more suitable for coding of some interlaced picture blocks was added in the MPEG-2 standard. A bit in the picture layer header, if set, selects the alternate scan. The patterns are represented below, in which the upper left corner is the DC term. zig-zag scan order: 0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63 alternate scan order: 0 4 6 20 22 36 38 52 1 5 7 21 23 37 39 53 2 8 19 24 34 40 50 54 3 9 18 25 35 41 51 55 10 17 26 30 42 46 56 60

11 16 27 31 43 47 57 61 12 15 28 32 44 48 58 62 13 14 29 33 45 49 59 63 The Zig-Zag Sequence One reason the JPEG algorithm compresses so effectively is that a large number of coefficients in the DCT image are truncated to zero values during the coefficient quantization stage. So many values are set to zero that the JPEG committee elected to handle zero values differently from other coefficient values. Instead of relying on Huffman or arithmetic coding to compress the zero values, they are coded using a Run-Length Encoding (RLE) algorithm. A simple code is developed that gives a count of consecutive zero values in the image. Since over half of the coefficients are quantized to zero in many images, this gives an opportunity for outstanding compression. One way to increase the length of runs is to reorder the coefficients in the zig-zag sequence. Instead of compressing the coefficient in row-major order, as a programmer would probably do, the JPEG algorithm moves through the block along diagonal paths, selecting what should be the highest value elements first, and working its way toward the values likely to be lowest. The actual path of the zig-zag sequence is shown in Figure 11.12. In the code used in this chapter, the diagonal sequences of quantization steps follow exactly the same lines, so the zig-zag sequence should be optimal for our purposes.

{7, 1}, {6, 2}, {5, 3}, {4, 4}, {3, 5}, {2, 6}, {1, 7}, {2, 7}, {3, 6}, {4, 5}, {5, 4}, {6, 3}, {7, 2}, {7, 3}, {6, 4}, {5, 5}, {4, 6}, {3, 7}, {4, 7}, {5, 6}, {6, 5}, {7, 4}, {7, 5}, {6, 6}, {5, 7}, {6, 7}, {7, 6}, Figure 11.12 The path of the zig-zag sequence. Implementing the zig-zag sequence in C is probably done best using a simple lookup table. In our sample code for this chapter, the sequence is coded as part of a structure that can be accessed sequentially to determine which row and column to encode: struct zigzag { int row; int col; } ZigZag[ N * N ] = { {0, 0}, {0, 1}, {1, 0}, {2, 0}, {1, 1}, {0, 2}, {0, 3}, {1, 2}, {2, 1}, {3, 0}, {4, 0}, {3, 1}, {2, 2}, {1, 3}, {0, 4}, {0, 5}, {1, 4}, {2, 3}, {3, 2}, {4, 1}, {5, 0}, {6, 0}, {5, 1}, {4, 2}, {3, 3}, {2, 4}, {1, 5}, {0, 6}, {0, 7}, {1, 6}, {2, 5}, {3, 4}, {4, 3}, {5, 2}, {6, 1}, {7, 0}, {7, 7} }; The C code that sends each of the DCT results to the compressor follows. Note that instead of directly looking up each result, we instead determine which row and column to use next by looking it up in the zig-zag structure. We then encode the element determined by the row and column from the zig-zag structure. for ( i = 0 ; i < ( N * N ) ; i ++ ) { row = ZigZag[ i ].row; col = ZigZag[ i ].col; result = DCT[ row ][ col ] / Quantum[ row ][ col ]; OutputCode( output_file, ROUND( result ) ); } Entropy Encoding After converting the DC element to a difference from the last block, then reordering the DCT block in the zig-zag sequence, the JPEG algorithm outputs the elements using an entropy encoding mechanism. The output has RLE built into it as an integral part of the coding mechanism. Basically, the output of the entropy encoder consists of a sequence of three tokens, repeated until the block is complete. The three tokens look like this: Run Length: The number of consecutive zeros that preceded

the current element in the DCT output matrix. Bit Count: Amplitude: The number of bits to follow in the amplitude number. The amplitude of the DCT coefficient.

Note that each bit count encodes a symmetrical set of high and low values. The values skipped over in the middle will be encoded with a lower bit count from one in the table. While this form of variable-bit coding is not quite as efficient as Huffman coding, it works fairly well, particularly if the data performs as expected, which means smaller values dominate and larger values are rare. Q.27 What do you mean by Multimedia Database? What are the Contents of MMDB? How Designing MMDBs.also Explain Different Types of Multimedia Databases with Difficulties Involved with Multimedia Databases. Ans. Multimedia Database Multimedia data typically means digital images, audio, video, animation and graphics together with text data. The acquisition, generation, storage and processing of multimedia data in computers and transmission over networks have grown tremendously in the recent past. This astonishing growth is made possible by three factors. Firstly, personal computers usage becomes widespread and their computational power gets increased. Also technological advancements resulted in high-resolution devices, which can capture and display multimedia data (digital cameras, scanners, monitors, and printers). Also there came high-density storage devices. Secondly high-speed data communication networks are available nowadays. The Web has wildly proliferated and software for manipulating multimedia data is now available. Lastly, some specific applications (existing) and future applications need to live with multimedia data. This trend is expected to go up in the days to come. Multimedia data are blessed with a number of exciting features. They can provide more effective dissemination of information in science, engineering , medicine, modern biology, and social

The coding sequence used in this chapters test program is a combination of Run Length Encoding and variable-length integer coding. The run-length and bit-count values are combined to form a code that is output. The bit count refers to the number of bits used to encode the amplitude as a variable-length integer. The variable-length integer coding scheme takes advantage of the fact that the DCT output should consist of mostly smaller numbers, which we want to encode with smaller numbers of bits. The bit counts and the amplitudes which the encode follow. Bit Count 1 2 3 4 5 6 7 8 9 10 Amplitudes -1, 1 -3 to -2, 2 to 3 7 to -4, 4 to 7 -15 to -8, 8 to 15 -31 to -16, 16 to 31 -63 to -32, 32 to 64 -127 to -64, 64 to 127 -255 to -128, 128 to 255 -511 to -256, 256 to 511 -1023 to -512, 512 to 1023

sciences. It also facilitates the development of new paradigms in distance learning, and interactive personal and group entertainment. The huge amount of data in different multimedia-related applications warranted to have databases as databases provide consistency, concurrency, integrity, security and availability of data. From an user perspective, databases provide functionalities for the easy manipulation, query and retrieval of highly relevant information from huge collections of stored data. MultiMedia Databases (MMDBs) have to cope up with the increased usage of a large volume of multimedia data being used in various software applications. The applications include digital libraries, manufacturing and retailing, art and entertainment, journalism and so on. Some inherent qualities of multimedia data have both direct and indirect influence on the design and development of a multimedia database. MMDBs are supposed to provide almost all the functionalities, a traditional database provides. Apart from those, a MMDB has to provide some new and enhanced functionalities and features. MMDBs are required to provide unified frameworks for storing, processing, retrieving, transmitting and presenting a variety of media data types in a wide variety of formats. At the same time, they must adhere to numerical constraints that are normally not found in traditional databases. A multimedia database is a database that hosts one or more primary media file types such as .txt (documents), .jpg (images), .swf (videos), .mp3 (audio), etc. And loosely fall into three main categories: Static media (time-independent, i.e. images and handwriting) Dynamic media (time-dependent, i.e. video and sound bytes) Dimensional media (i.e. 3D games or computer-aided drafting programs- CAD)

All primary media files are stored in binary strings of zeros and ones, and are encoded according to file type. The term "data" is typically referenced from the computer point of view, whereas the term "multimedia" is referenced from the user point of view.

Contents of MMDB
An MMDB needs to manage several different types of information pertaining to the actual multimedia data. They are: Media data - This is the actual data representing images, audio, video that are captured, digitized, processes, compressed and stored. Media format data - This contains information pertaining to the format of the media data after it goes through the acquisition, processing, and encoding phases. For instance, this consists of information such as the sampling rate, resolution, frame rate, encoding scheme etc. Media keyword data - This contains the keyword descriptions, usually relating to the generation of the media data. For example, for a video, this might include the date, time, and place of recording , the person who recorded, the scene that is recorded, etc This is also called as content descriptive data. Media feature data - This contains the features derived from the media data. A feature characterizes the media contents. For example, this could contain information about the distribution of colors, the kinds of textures and the different shapes present in an image. This is also referred to as content dependent data. The last three types are called meta data as they describe several different aspects of the media data. The media keyword data and media feature data are used as indices for searching purpose. The media format data is used to present the retrieved information.

Designing MMDBs

Many inherent characteristics of multimedia data have direct and indirect impacts on the design of MMDBs. These include : the huge size of MMDBs, temporal nature, richness of content, complexity of representation and subjective interpretation. The major challenges in designing multimedia databases arise from several requirements they need to satisfy such as the following: Manage different types of input, output, and storage devices. Data input can be from a variety of devices such as scanners, digital camera for images, microphone, MIDI devices for audio, video cameras. Typical output devices are high-resolution monitors for images and video, and speakers for audio. Handle a variety of data compression and storage formats. The data encoding has a variety of formats even within a single application. For instance, in medical applications, the MRI images of brain has lossless or very stringent quality of lossy coding technique, while the X-ray images of bones can be less stringent. Also, the radiological image data, the ECG data, other patient data, etc. have widely varying formats. Support different computing platforms and operating systems. Different users operate computers and devices suited to their needs and tastes. But they need the same kind of user-level view of the database. Integrate different data models. Some data such as numeric and textual data are best handled using a relational database model, while some others such as video documents are better handled using an object-oriented database model. So these two models should coexist together in MMDBs. Offer a variety of user-friendly query systems suited to different kinds of media. From a user point of view, easy-to-use queries and fast and accurate retrieval of information is highly desirable. The query for the same item can be in different forms. For example, a portion of interest in a video can be queried by using either

1) a few sample video frames as an example, 2) a clip of the corresponding audio track or 3) a textual description using keywords. Handle different kinds of indices. The inexact and subjective nature of multimedia data has rendered keyword-based indices and exact and range searches used in traditional databases ineffective. For example, the retrieval of records of persons based on social security number is precisely defined, but the retrieval of records of persons having certain facial features from a database of facial images requires, content-based queries and similarity-based retrievals. This requires indices that are content dependent, in addition to key-word indices. Develop measures of data similarity that correspond well with perceptual similarity. Measures of similarity for different media types need to be quantified to correspond well with the perceptual similarity of objects of those data types. These need to be incorporated into the search process Provide transparent view of geographically distributed data. MMDBs are likely to be a distributed nature. The media data resides in many different storage units possibly spread out geographically. This is partly due to the changing nature of computation and computing resources from centralized to networked and distributed. Adhere to real-time constraints for the transmission of media data. Video and audio are inherently temporal in nature. For example, the frames of a video need to be presented at the rate of at least 30 frames/sec. for the eye to perceive continuity in the video. Synchronize different media types while presenting to user. It is likely that different media types corresponding to a single multimedia object are stored in different formats, on different devices, and have different rates of transfer. Thus they need to be periodically synchronized for presentation.

The recent growth in using multimedia data in applications has been phenomenal. Multimedia databases are essential for efficient management and effective use of huge amounts of data. The diversity of applications using multimedia data, the rapidly changing technology, and the inherent complexities in the semantic representation, interpretation and comparison for similarity pose many challenges. MMDBs are still in their infancy. Today's MMDBs are closely bound to narrow application areas. The experiences acquired from developing and using novel multimedia applications will help advance the multimedia database technology. Types of Multimedia Databases There are numerous different types of multimedia databases, including: The Authentication Multimedia Database (also known as a Verification Multimedia Database, i.e. retina scanning), is a 1:1 data comparison The Identification Multimedia Database is a data comparison of one-to-many (i.e. passwords and personal identification numbers A newly-emerging type of multimedia database, is the Biometrics Multimedia Database; which specializes in automatic human verification based on the algorithms of their behavioral or physiological profile. This method of identification is superior to traditional multimedia database methods requiring the typical input of personal identification numbers and passwordsDue to the fact that the person being identified does not need to be physically present, where the identification check is taking place.

This removes the need for the person being scanned to remember a PIN or password. Fingerprint identification technology is also based on this type of multimedia database. Difficulties Involved with Multimedia Databases The difficulty of making these different types of multimedia databases readily accessible to humans is: The tremendous amount of bandwidth they consume; Creating Globally-accepted data-handling platforms, such as Joomla, and the special considerations that these new multimedia database structures require. Creating a Globally-accepted operating system, including applicable storage and resource management programs need to accommodate the vast Global multimedia information hunger. Multimedia databases need to take into accommodate various human interfaces to handle 3D-interactive objects, in an logicallyperceived manner (i.e. SecondLife.com). Accommodating the vast resources required to utilize artificial intelligence to it's fullest potential- including computer sight and sound analysis methods. The historic relational databases (i.e the Binary Large Objects BLOBs- developed for SQL databases to store multimedia data) do not conveniently support content-based searches for multimedia content. This is due to the relational database not being able to recognize the internal structure of a Binary Large Object and therefore internal multimedia data components cannot be retrieved... Basically, a relational database is an "everything or nothing" structure- with files retrieved and stored as a whole, which makes a relational database completely inefficient for making multimedia data easily accessible to humans.

In order to effectively accommodate multimedia data, a database management system, such as an Object Oriented Database (OODB) or Object Relational Database Management System (ORDBMS). Examples of Object Relational Database Management Systems include Odaptor (HP): UniSQL, ODB-II, and Illustra. The flip-side of the coin, is that unlike non-multimedia data stored in relational databases, multimedia data cannot be easily indexed, retrieved or classified, except by way of social bookmarking and ranking-rating, by actual humans. This is made possible by metadata retrieval methods, commonly referred to as tags, and tagging. This is why you can search for dogs, as an example, and a picture comes up based on your text search tem. This is also referred to a schematic mode. Whereas doing a search with a picture of a dog to locate other dog pictures is referred to as paradigmatic mode. However, metadata retrieval, search, and identify methods severely lack in being able to properly define uniform space and texture descriptions, such as the spatial relationships between 3D objects, etc. The Content-Based Retrieval multimedia database search method (CBR), however, is specifically based on these types of searches. In other words, if you were to search an image or sub-image; you would then be shown other images or sub-images that related in some way to your the particular search, by way of color ratio or pattern, etc. Q.28 How Content based retrieval for text and images is handle for information in multimedia system. Ans. Content-Based Multimedia Information Handling

Introduction First some definitions. Multimedia information is digital information which may be visual data, images or video for example; or it may be sound data, music or speech; and it may now include 3D visualisations or mixed reality experiences. Finally it will almost certainly include the medium with which we are most familiar, that is text. In this article we use the word document to refer to a multimedia object. It may be a collection of text, an article or a book, it may be an image or a video, or a frame of a video, it may be a mixture of these, in fact it may be any type of basic digital information object. The issues we are going to discuss apply to text as much as to images or other media and it will be useful to establish the ideas by talking first about text on its own. Let us begin by distinguishing between retrieval and navigation. Retrieval is the business of extracting a document from a collection in order to satisfy some query. The query may take a variety of forms. For example, we may require documents by a particular author, or about a particular subject. This sort of retrieval has traditionally been achieved by using indexed metadata that is stored with the document. Key terms in the metadata may give a controlled vocabulary to aid the retrieval. Content-based retrieval of text is retrieval that uses the text of the document rather than any added metadata. Free text searching is a good example of content-based text retrieval. The words making up the content of the document are indexed and used as the basis for retrieval, sometimes in conjunction with quite sophisticated intelligent software used to satisfy the query. Search engines like Google and AltaVista offer content-based text retrieval on the Web. By contrast, navigation is the process of moving from one document in the information collection to another because there is some useful association between them, and this is typically achieved by following pre-authored links. On the Web this is

achieved by clicking on a highlighted source anchor of a link in one document in order to navigate to the destination document to which it points. Sometimes the distinction between navigation and retrieval is unclear. For example, following links that are stored in a bookmark file under a particular subject heading could be regarded in one sense as indexed retrieval and in another sense as link-based navigation. This is also true when using a search engine to retrieve documents on a particular subject. The documents are presented initially as links to be followed. In both these examples we will regard the process as retrieval rather than navigation, as the aim is to retrieve rather than to follow an association between documents. On the Web, navigation is mainly based on fixed links that are embedded in the documents themselves. However, it is possible for hypermedia navigation to be content-based. By this we mean that the links offered are determined at link following time and selected on the basis of the content of the chosen source anchor. Link authoring for content-based navigation involves making an association between some chosen source anchor and the address of a destination document. The link information may be stored in a separate location from the document, typically a linkbase holding source anchors and destination addresses. With this content-based approach to navigation, multiple links may be made available for a given source anchor, previously authored links may be added to new documents on the fly with minimal effort and different viewers may see different link sets depending on the linkbases which are active at the time [1]. In both content-based retrieval and content-based navigation for text, the process depends on matching content. In the case of retrieval, the textual content of the query is matched with text forming the content of the document, typically indexed in some way to accelerate the retrieval process. In content-based navigation, the query (which is typically a portion of text selected from the content of the document) is matched with the text making up the source anchors of links in the linkbase.

For text, these processes of content-based retrieval and navigation are sufficiently well established and widely used for us to conclude with some conviction that content-based retrieval and navigation are worthwhile and effective approaches for text information handling. Of course metadata based searches with text are also widely used and the two approaches can complement each other well. The content matching, on which text content-based processes depend, are in many cases straightforward exact matches between words, although statistical matches between word sets, term switching or query expansion via thesauri, word stemming and other textual tricks can greatly enhance the processes to provide more powerful retrieval and navigation facilities. Now let us turn our attention to content based retrieval and navigation with non-text media. We will use images as our example although many of the comments will apply equally to other non-text media. Can we say with the same conviction as we did for text that content-based image retrieval and navigation are worthwhile and effective approaches for image information handling? Well, in short, the answer is No, certainly not with the same conviction. But there are circumstances where contentbased retrieval and content-based navigation may be worthwhile particularly in conjunction with metadata-based techniques. And in the longer term, as research into media processing offers up more powerful approaches, the value of content-based techniques should increase. In the following sections we look more closely at content-based image retrieval and navigation techniques, examine why they are currently less powerful than for text and examine specific efforts to make them more effective. Content-based Image Retrieval The basic reason why image retrieval is more difficult than text retrieval is that the digital representation for most images is as a collection of pixels. The only information which is explicit in such a representation is the colour values at each pixel point. Although, when we look at images we, as humans, are able to interpret them

automatically and see meaningful regions of colour, recognise objects and identify scenes which can usefully form the basis of effective image matching processes, we are performing substantial and sophisticated information processing which relies on a large volume of prior knowledge for its success. To achieve effective content based image retrieval (CBIR) software systems must achieve some of this extraction and interpretation in order to find something meaningful to form the basis of the content matching. By contrast, in text documents, the words themselves are explicit in the digital document and it is these that form the basis of the matching process. Hence for text retrieval in its basic form, little additional processing is required. There have been some excellent recent reviews of content-based image retrieval [2], [3] and the reader is encouraged to look at these for further details. Querying in CBIR can take many forms but the most common is probably the query by example paradigm where the user provides a query image and asks for images from the collection that are similar to it in some way. An alternative might be to ask explicitly for images containing some particular object using a text interface to provide a description of the required image. Such an approach requires that the CBIR system can perform object recognition or scene analysis in order to find the required image and at present this is only possible in specific highly constrained application domains. General approaches to CBIR attempt to find representations of the image which make more information explicit than simply the pixel colour values. Unsurprisingly, many of the approaches have been based on colour. The colour histogram [4] has been a simple and popular representation which captures the relative amounts of each colour in an image. But it is a global measure and does not give information about colour variations at local positions in the image. Nevertheless it provides a useful measure of some aspects of similarity between images and has been widely used in CBIR systems.

To overcome the global nature of the colour histogram the image has sometimes been divided into patches and the colour histogram calculated for each patch. This allows images to be retrieved from a collection when the query image is only similar to a sub-section of an image in the collection. This is taken further when the images are decomposed into patches hierarchically at decreasing resolutions. A representation which also tries to capture some local colour information is the colour coherence vector representation [5] which counts separately pixels which belong to large (coherent) regions of the same colour and those which do not. We have developed an approach to sub-image matching which uses a pyramid of colour coherence vectors and which can locate details of high resolution art images in large collections of such images [6]. An example of a sub-image query is shown in figure 1 and the resulting match with the located sub-image is shown in figure 2.

Figure 1

language understanding software it will continue to exist at some level. For CBIR, a starting point would be to represent explicitly any objects in the image. Shape is an important cue to object recognition and many attempts to use shape in CBIR systems have been reported, even in the early systems like QBIC from IBM [8]. The big problem with this is knowing what constitutes an object. It is possible to segment images into regions and represent the shapes of the regions but the software needs to be trained to match or recognise particular object shapes which will typically be composed of several regions from a segmentation of the image. Some approaches to this have been reported in particular domains but general purpose CBIR systems using objects as intermediate representations are still uncommon. A rather simple example of shape finding comes from the Artiste project [9], a European project to develop a distributed art retrieval, navigation and analysis system. It includes a facility to detect images of paintings in frames of a particular shape. Most frames are rectangular but some are circular, some are triptych etc. A border finder locates the boundary of the frame in the image and a neural net classifier has been trained to use the border to deliver the frame type. Q.29 How is Video is represented in multimedia system explain different representation techniques. Ans. Video Representation

Figure 2 A representation which captures information about colour boundaries within the image has been proposed by Matas et al [7] in an approach they call the multi-modal neighbourhood signature representation. This approach has the added benefit that sub images can be matched directly without the need for a pyramid decomposition. Colour is not the only basis for representations in CBIR. Texture, which in image processing refers to a measure of repeating patterns in an image, has also provided a useful basis for representations. Again the representations tend to be global and only appear useful for some particular image types where repeating patterns are a central characteristic. For the ultimate CBIR system what we need is perfect image understanding software. We need to bridge the so-called semantic gap, even to be able to address queries like Find me images in this collection containing a building. A simple query by example would be inadequate for satisfying this simple query without a substantial knowledge of the variety of ways in which buildings may appear in images. An even greater challenge comes from queries like Find me images in this collection which depict acts of kindness It is worth noting that the semantic gap also exists for text. The gap is not as wide but until we have perfect natural

Choosing a color scheme for video Most of these suggestions are derived from a document originally (1989) written by Bruce Land and now copyrighted by the Cornell Theory Center, and further enhanced by the San Diego Supercomputer Center. First, note that the television systems are designed to do particularly well on skin tones, not on hydrodynamical simulations. Very bright colors will vibrate garishly and distractingly. You should use colors at 80% saturation or less for video. Humans are better at distinguishing shades of green than at colors on the ends of the visual spectrum, and happily, greenish colors do reasonably well on video, because that is another color the tv systems are designed to reproduce well. Yellow is also a good color for humans, but to use it on video you must lower the saturation. Cyan does well too, but anything further blue will not be good for showing subtle differences, though is nice for other purposes. Differences in brightness are more easily distinguished on video than are differences in hue. In displaying lines or areas of abrupt color differences, there should be at least 5 pixels width per color to avoid "chroma crawl", an effect wherein the lines seem to creep along at the color boundary. This means that such abrupt color differences should not be used for text display, nor for displaying particles (then it gets called "dot crawl", but the change of terminology does nothing to make it less annoying). Use less saturated colors for the dots to avoid this. Blue makes a better background for video than black. Note also that computer black is very different from video black. You do better with light colored text on a dark colored background than the reverse. It has been shown that people who are asked to choose between two pictures that are identical except in brightness will pick the

Q.30 How is color scheme is choose for video in multimedia system. Ans. Color in video

brighter one. So your image overall should have high intensity (n.b not high saturation) Since you have probably used RGB and not HSV to create the color scheme for your visualization, here is the conversion Comparing computer and video colors These color comparisons are excerpted from documentation at the San Diego Supercomputing Center. They come from a public domain program they wrote calledInteractive Color, which unfortunately is only available as a MacIntosh binary. They also unfortunately refer only to NTSC colors, but you should find them informative for PAL video as well. The purpose of the comparisons is to show you the exact RGB values required to reproduce the standard colorbar (which is the color scheme to which the TV will be adjusted), but you can also see from the colors and from the information in the captions what kind of things to watch out for. Do note that the range of values for RGB in the color comparisons is 0-100 (ie in percents of maximum), not 0-255. Color Bars

Since NTSC Colors are prone to errors, television manufacturers, have developed the above colorbar to adjust television sets. It contains the major primary and secondary colors, plus black and white. Black

Computer black is the absence of all of the primaries. Computer black with a value of 0,0,0 is referred to as superblack and is used for matting or keying in video effects. Superblack does not render well to video. Instead of appearing black, it has a light gray appearance. Therefore, the standard NTSC colorbar black, which is 8% gray, should be used. Video black will appear blacker on video than superblack. White

Computer white, being a mixture of fully saturated red, green, and blue, is the brightest color. It requires the least amount of desaturation prior to videotaping. Yellow

Since it is a mixture of the brightest computer primaries, red and green, computer yellow is the brightest of all the computer secondary colors. Computer yellow appears slightly greenish at full saturation and can be modified by lowering the green values slightly. Colorbar yellow appears bright although it is mixed with blue. Cyan

Computer cyan is a secondary color with almost the same brightness as computer yellow. Saturation levels should be controlled and de-saturated before transferring to video. Colorbar cyan still appears as bright as computer cyan though it is lower in saturation and mixed with a small amount of red. Green

Computer green is the lightest and brightest of all the computer additive primary colors. It appears yellowish at full saturation rather than what is traditionally thought of as green. Prior to videotaping, it nees to be toned down by lowering the level of saturation. The addition of a little red makes it appear more yellowish and the addition of a little blue makes it appear more bluish. Traditional deep greens are difficult to mix on the computer. Magenta

Computer magenta results from the mixture of primaries red and blue. It should be de-saturated before transferring to video. Since it is partly made of red, it has a tendency to bleed and cause blurring effects around its edges. A small amount of green will make it the standard colorbar magenta. Red

Computer red, though good looking on a high resolution monitor, cannot be properly transferred to video. It creates unwanted color artifacts such as color bleeding, or halation, where a halo appears around the color. To avoid this, lower the saturation to the video safe colorbar levels. Also note that at full saturation, this red is not what is thought of as red. It is actually a red-orange and needs a slight amount of blue added to obtain a traditional red. In video, the color red should be avoided as much as possible. Blue

compression is an example of the concept of source coding in Information theory. This article deals with its applications: compressed video can effectively reduce the bandwidth required to transmit video via terrestrial broadcast, via cable TV, or via satellite TV services. Video quality Most video compression is lossy it operates on the premise that much of the data present before compression is not necessary for achieving good perceptual quality. For example, DVDs use a video coding standard called MPEG-2 that can compress around two hours of video data by 15 to 30 times, while still producing a picture quality that is generally considered high-quality for standard-definition video. Video compression is a tradeoff between disk space, video quality, and the cost of hardware required to decompress the video in a reasonable time. However, if the video is overcompressed in a lossy manner, visible (and sometimes distracting) artifacts can appear. Computer blue is one of most popularly selected colors. It is the least bright of the primaries and requires the least amount of desaturation before transferring to video. It is important to note that fully saturated computer blue is lost on a black background, especially a superblack background. Placed on a lighter gray background, it will appear much brighter. Q.31 Explain video compression in detail. what are the different standards supported by Video compression? Ans. Video Compression Video compression refers to reducing the quantity of data used to represent digital video images, and is a combination of spatial image compression and temporal motion compensation. Video Video compression typically operates on square-shaped groups of neighboring pixels, often called macroblocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec (encode/decode scheme) sends only the differences within those blocks. This works extremely well if the video has no motion. A still frame of text, for example, can be repeated with very little transmitted data. In areas of video with more motion, more pixels change from one frame to the next. When more pixels change, the video compression scheme must send more data to keep up with the larger number of pixels that are changing. If the video content includes an explosion, flames, a flock of thousands of birds, or any other image with a great deal of high-frequency detail, the quality will decrease, or the variable bitrate must be increased to render this added information with the same level of detail. The programming provider has control over the amount of video compression applied to their video programming before it is sent to their distribution system. DVDs, Blu-ray discs, and HD DVDs

have video compression applied during their mastering process, though Blu-ray and HD DVD have enough disc capacity that most compression applied in these formats is light, when compared to such examples as most video streamed on the internet, or taken on a cellphone. Software used for storing video on hard drives or various optical disc formats will often have a lower image quality, although not in all cases. High-bitrate video codecs with little or no compression exist for video postproduction work, but create very large files and are thus almost never used for the distribution of finished videos. Once excessive lossy video compression compromises image quality, it is impossible to restore the image to its original quality. Theory Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizontal and vertical) directions of the moving pictures, and one dimension represents the time domain. A data frame is a set of all pixels that correspond to a single time moment. Basically, a frame is the same as a still picture. Video data contains spatial and temporal redundancy. Similarities can thus be encoded by merely registering differences within a frame (spatial), and/or between frames (temporal). Spatial encoding is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in color as easily as it can perceive changes in brightness, so that very similar areas of color can be "averaged out" in a similar way to jpeg images (JPEG image compression FAQ, part 1/2). With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of frames. Lossless compression Some forms of data compression are lossless. This means that when the data is decompressed, the result is a bit-for-bit perfect match with the original. While lossless compression of video is

possible, it is rarely used, as lossy compression results in far higher compression ratios at an acceptable level of quality. Intraframe versus interframe compression One of the most powerful techniques for compressing video is interframe compression. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame, while intraframe compression uses only the current frame, which is effectively image compression. The most commonly used method works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If sections of the frame move in a simple manner, the compressor emits a (slightly longer) command that tells the decompresser to shift, rotate, lighten, or darken the copy a longer command, but still much shorter than intraframe compression. Interframe compression works well for programs that will simply be played back by the viewer, but can cause problems if the video sequence needs to be edited. Since interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Some video formats, such as DV, compress each frame independently using intraframe compression. Making 'cuts' in intraframe-compressed video is almost as easy as editing uncompressed video one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames, and so require much more data than other frames nearby.

It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality. Current forms Today, nearly all commonly used video compression methods (e.g., those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as fractal compression, matching pursuit and the use of a discrete wavelet transform (DWT) have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness to such methods.[citation needed] Timeline The following table is a partial history of international video compression standards. History of Video Compression Standards Year Standard 1984 H.120 1990 H.261 Publisher Popular Implementations ITU-T ITU-T Videoconferencing, Videotelephony Video-CD

1996 H.263

ITU-T

Videoconferencing, Videotelephony, Video Mobile Phones (3GP) Video on Xvid ...) Internet

on

1999 MPEG-4 Part 2 ISO, IEC

(DivX,

2003

Blu-ray, Digital Video H.264/MPEG-4 ISO, IEC, Broadcasting, iPod Video, HD AVC ITU-T DVD

Q.32 What is compression? Why is video compression used? Explain different Image and Video Compression Standards in detail with suitable examples . Ans. What is compression? Compression is a reversible conversion (encoding) of data that contains fewer bits. This allows a more efficient storage and transmission of the data. The inverse process is called decompression (decoding). Software and hardware that can encode and decode are called decoders. Both combined form a codec and should not be confused with the terms data container or compression algorithms

1993 MPEG-1 Part 2 ISO, IEC 1995

H.262/MPEG-2 ISO, IEC, DVD Video, Blu-ray, Digital Part 2 ITU-T Video Broadcasting, SVCD

720 x 576 x 25 x 8 + 2 x (360 x 576 x 25 x 8) = 1.66 Mb/s (luminance + chrominance) For High Definition Television (HDTV): 1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 x 8) = 1.99 Gb/s Figure 1: Relation between codec, data containers and compression algorithms. Lossless compression allows a 100% recovery of the original data. It is usually used for text or executable files, where a loss of information is a major damage. These compression algorithms often use statistical information to reduce redundancies. Huffman-Coding [1] and Run Length Encoding [2] are two popular examples allowing high compression ratios depending on the data. Using lossy compression does not allow an exact recovery of the original data. Nevertheless it can be used for data, which is not very sensitive to losses and which contains a lot of redundancies, such as images, video or sound. Lossy compression allows higher compression ratios than lossless compression. Why is video compression used? A simple calculation shows that an uncompressed video produces an enormous amount of data: a resolution of 720x576 pixels (PAL), with a refresh rate of 25 fps and 8-bit colour depth, would require the following bandwidth: Image and Video Compression Standards The following compression standards are the most known nowadays. Each of them is suited for specific applications. Top entry is the lowest and last row is the most recent standard. The Even with powerful computer systems (storage, processor power, network bandwidth), such data amount cause extreme high computational demands for managing the data. Fortunately, digital video contains a great deal of redundancy. Thus it is suitable for compression, which can reduce these problems significantly. Especially lossy compression techniques deliver high compression ratios for video data. However, one must keep in mind that there is always a trade-off between data size (therefore computational time) and quality. The higher the compression ratio, the lower the size and the lower the quality. The encoding and decoding process itself also needs computational resources, which have to be taken into consideration. It makes no sense, for example for a real-time application with low bandwidth requirements, to compress the video with a computational expensive algorithm which takes too long to encode and decode the data.

MPEG standards are the most widely used ones, which will be explained in more details in the following sections.

forward and backward search and a synchronisation of audio and video. A stable behaviour, in cases of data loss, as well as low computation times for encoding and decoding was reached, which is important for symmetric applications, like video telephony .

The MPEG standards MPEG stands for Moving Picture Coding Exports Group [4]. At the same time it describes a whole family of international standards for the compression of audio-visual digital data. The most known are MPEG-1, MPEG-2 and MPEG-4, which are also formally known as ISO/IEC-11172, ISO/IEC-13818 and ISO/IEC- 14496. More details about the MPEG standards can be found in [4],[5],[6]. The most important aspects are summarised as follows: The MPEG-1 Standard was published 1992 and its aim was it to provide VHS quality with a bandwidth of 1,5 Mb/s, which allowed to play a video in real time from a 1x CD-ROM. The frame rate in MPEG-1 is locked at 25 (PAL) fps and 30 (NTSC) fps respectively. Further MPEG-1 was designed to allow a fast

In 1994 MPEG-2 was released, which allowed a higher quality with a slightly higher bandwidth. MPEG-2 is compatible to MPEG-1. Later it was also used for High Definition Television (HDTV) and DVD, which made the MPEG-3 standard disappear completely. The frame rate is locked at 25 (PAL) fps and 30 (NTSC) fps respectively, just as in MPEG-1. MPEG-2 is more scalable than MPEG-1 and is able to play the same video in different resolutions and frame rates. MPEG-4 was released 1998 and it provided lower bit rates (10Kb/s to 1Mb/s) with a good quality. It was a major development from MPEG-2 and was designed for the use in interactive environments, such as multimedia applications and video communication. It enhances the MPEG family with tools to lower the bit-rate individually for certain applications. It is therefore more adaptive to the specific area of the video usage. For multimedia producers, MPEG-4 offers a better reusability of the contents as well as a copyright protection. The content of a frame can be grouped into object, which can be accessed individually via the MPEG-4 Syntactic Description Language (MSDL). Most of the tools require immense

computational power (for encoding and decoding), which makes them impractical for most normal, nonprofessional user applications or real time applications. The real-time tools in MPEG-4 are already included in MPEG-1 and MPEG-2. More details about the MPEG-4 standard and its tool can be found in . The MPEG Compression The MPEG compression algorithm encodes the data in 5 steps [6], [8]: First a reduction of the resolution is done, which is followed by a motion compensation in order to reduce temporal redundancy. The next steps are the Discrete Cosine Transformation (DCT) and a quantizationas it is used for the JPEG compression; this reduces the spatial redundancy (referring to human visualperception). The final step is an entropy coding using the Run Length Encoding and the Huffman codingalgorithm. Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV colour components help to use this effect for compression. The chrominance components U and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a half of the pixels in both the horizontal and vertical (4:2:0).

Figure 2: Depending on the subsampling, 2 or 4 pixel values of the chrominance channel can be grouped together.

The subsampling reduces the data volume by 50% for the 4:2:0 and by 33% for the 4:2:2 subsampling:

frame). B-frames cannot be referenced by other P- or Bframes, because they are interpolated from forward and backward frames. P-frames and B-frames are called inter coded frames, whereas Iframes are known as intra coded frames.

MPEG uses similar effects for the audio compression, which are not discussed at this point. Step 2: Motion Estimation An MPEG video can be understood as a sequence of frames. Because two successive frames of a video sequence often have small differences (except in scene changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses three types of frames:I-frames (intra), P-frames (predicted) and Bframes (bidirectional). The I-frames are key-frames, which have no reference to other frames and their compression is not that high. The P-frames can be predicted from an earlier I-frame or P-frame. P-frames cannot be reconstructed without their referencing frame, but they need less space than the I-frames, because only the differences are stored. The B-frames are a two directional version of the P-frame, referring to both directions (one forward frame and one backward

Figure 3:. An MPEG frame sequence with two possible references: a P-frame referring to a I-frame and a B-frame referring to two P-frames.

position is higher than a threshold, otherwise the whole block is transmitted. Block Matching - In general block matching tries, to stitch together an actual predicted frame by using snippets (blocks) from previous frames. The process of block matching is the most time consuming one during encoding. In order to find a matching block, each block of the current frame is compared with a past frame within a search area. Only the luminance information is used to compare the blocks, but obviously the colour information will be included in the encoding. The search area is a critical factor for the quality of the matching. It is more likely that the algorithm finds a matching block, if it searches a larger area. Obviously the number of search operations increases quadratically, when extending the search area. Therefore too large search areas slow down the encoding process dramatically. To reduce these problems often rectangular search areas are used, which take into account, that horizontal movements are more likely than vertical ones.

The usage of the particular frame type defines the quality and the compression ratio of the compressed video. I-frames increase the quality (and size), whereas the usage of B-frames compresses better but also produces poorer quality. The distance between two I-frames can be seen as a measure for the quality of an MPEGvideo. In practise following sequence showed to give good results for quality and compression level: IBBPBBPBBPBBIBBP. The references between the different types of frames are realised by a process called motion estimation or motion compensation. The correlation between two frames in terms of motion is represented by a motion vector. The resulting frame correlation, and therefore the pixel arithmetic difference, strongly depends on how good the motion estimation algorithm is implemented. Good estimation results in higher compression ratios and better quality of the coded video sequence. However, motion estimation is a computational intensive operation, which is often not well suited for real time applications. Figure 4 shows the steps involved in motion estimation, which will be explained as follows: Frame Segmentation - The Actual frame is divided into nonoverlapping blocks (macro blocks) usually 8x8 or 16x16 pixels. The smaller the block sizes are chosen, the more vectors need to be calculated; the block size therefore is a critical factor in terms of time performance, but also in terms of quality: if the blocks are too large, the motion matching is most likely less correlated. If the blocks are too small, it is probably, that the algorithm will try to match noise. MPEG uses usually block sizes of 16x16 pixels. Search Threshold - In order to minimise the number of expensive motion estimation calculations, they are only calculated if the difference between two blocks at the same

Prediction Error Coding - Video motions are often more complex, and a simple shifting in 2D is not a perfectly suitable description of the motion in the actual scene, causing so called prediction errors . The MPEG stream contains a matrix for compensating this error. After prediction the, the predicted and the original frame are compared, and their differences are coded. Obviously less data is needed to store only the differences (yellow and black regions in Figure 5).

Figure 5 Vector Coding - After determining the motion vectors and evaluating the correction, these can be compressed. Large parts of MPEG videos consist of B- and Pframes as seen before, and most of them have mainly stored

Figure 4: Schematic process of motion estimation.

motion vectors. Therefore an efficient compression of motion vector data, which has usually high correlation, is desired. Block Coding - see Discrete Cosine Transform (DCT) below.

Step 3: Discrete Cosine Transform (DCT)

Figure 6: Visualisation of 64 basis functions (cosine frequencies) of a DCT The first entry (top left in Figure 6) is called the direct currentterm, which is constant and describes the average grey level of the block. The 63 remaining terms are called alternating-current terms. Up to this point no compression of the block data has occurred. The data was only well-conditioned for a compression, which is done by the next two steps. Step 4: Quantization During quantization, which is the primary source of data loss, the DCT terms are divided by a quantization matrix, which takes into account human visual perception. The human eyes are more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry after quantization and the domain was reduced significantly.

Where Q is the quantisation Matrix of dimension N. The way Q is chosen defines the final compression level and therefore the quality. After Quantization the DC- and AC- terms are treated separately. As the correlation between the adjacent blocks is high, only the differences between the DC-terms are stored, instead of storing all values independently. The AC-terms are then stored in a zig-zag-path with increasing frequency values. This representation is optimal for the next coding step, because same values are stored next to each other; as mentioned most of the higher frequencies are zero after division with Q. Figure 8: Block artefacts after DCT

Step 5: Entropy Coding The entropy coding takes two steps: Run Length Encoding (RLE ) [2] and Huffman coding [1]. These are well known lossless compression methods, which can compress data, depending on its redundancy, by an additional factor of 3 to 4. All five Steps together

Figure 7: Zig-zag-path for storing the frequencies If the compression is too high, which means there are more zeros after quantization, artefacts are visible (Figure 8). This happens because the blocks are compressed individually with no correlation to each other. When dealing with video, this effect is even more visible, as the blocks are changing (over time) individually in the worst case.

extension to broadcast services to be delivered via an IP connection. In order to address the finite capacity of the broadcast network to carry interactive content, set-top boxes, integrated digital TVs (iDTVs) and PVRs conforming to the MHEG-5 IC specification can be connected to a broadband connection through the home network and public internet using a standard ISP connection. Interactive applications and associated data can be accessed either via the broadcast transmission or via the IP connection. The principles behind the MHEG-IC are to provide a seamless viewer experience of broadcast delivered content augmented with content - OTT services, information services (weather, news, sport, stock market), online shopping, social networking, interactive games - delivered over IP as an extension of the channel or network. Broadcasters have full editorial control of the user experience. The MHEG-IC gives access to streamed ondemand video content in addition to traditional text and graphics as well as the ability to support secure transactions. Whilst there is strong consumer demand for on-demand services such as catch-up TV, the widespread deployment of the MHEG-IC will be dependent upon the availability of fast, good quality of service and uncapped broadband connections. Most consumer devices are not professionally installed and the ease of connection to the home network is paramount in the success of any broadband content. For this reason, the MHEG-IC uses the same standard protocols used to deliver web content to PCs: TCP-IP, HTTP and HTTPS. This means that no special configuration of the home network is needed if its possible to browse the web from your home PC, then the MHEG-IC will also work when connected to the same network. The MHEG-IC can access content (applications, text, graphics and streaming media) from either the broadcast network or the IP connection. The MHEG-IC uses a sophisticated Hybrid File System, where the user in not aware whether the content is delivered via broadcast or over IP. This enables broadcasters to

As seen, MPEG video compression consists of multiple conversion and compression algorithms. At every step other critical compression issues occur and always form a trade-off between quality, data volume and computational complexity. However, the area of use of the video will finally decide which compression standard will be used. Most of the other compression standards use similar methods to achieve an optimal compression with best possible quality. Q.33How is MHEG Standard Video Streaming is used on the on net. Ans. MHEG Interaction Channel MHEG-5 Interaction Channel (MHEG-IC) Recent work by the DTG in the UK has led to the development of the MHEG-5 Interaction Channel (MHEG-IC), which enables an

create common applications that can work on both IP-connected and unconnected receivers in a seamless and user friendly way. The MHEG-IC allows the application to determine whether or not an IP connection is possible (i.e. the receiver is equipped with the appropriate hardware and software) and whether or not it is actually available (i.e. has the user actually connected the receiver to the home network?). MHEG-IC and streaming media MHEG-5 can access and control video and audio streams; these can be delivered either via the usual broadcast methods or with the MHEG-IC extension, via the IP connection. Streamed video at up to at least 2Mbits/sec is delivered to the receiver using industry-standard IP protocols and MPEG-4 H.264 encoding identical to that used in the broadcast environment. This approach simplifies the development of receivers and ensures a minimum cost solution to the end user. In addition, relying on standard protocols and encoding methods ensures that the technology will have a suitably long product lifetime demanded for consumer electronics devices and will not be made obsolete by new technology as so often happens in the PC environment. The MHEG-5 application has full control over the presentation of any streaming media and can provide control of the video either via on-screen controls using the basic remote control or through the playback keys on the remote control if they are present. MHEG-IC Security The MHEG-IC employs a number of levels of security, all relying on a Broadcaster Trust model. Certificates for secure HTTP connections are delivered over the broadcast transmission and can easily be updated. Applications delivered via the IP connection must be digitally signed and checked with a certificate delivered over-the-air to the receiver, protecting users from inappropriate content in the event of the internet server being hijacked. In addition, an approved server list provided in the broadcast carousel identifies which servers may be accessed by

the receiver. The MHEG-IC does not support standard web browsing using HTML and Java Script, but provides a TV-centric presentation of content with a standard TV remote control user interface. New Services With the availability of the MHEG-IC in both set-top boxes, iDTVs and PVRs, broadcasters and content owners are now able to offer a whole new range of interactive services based on both the push and/or pull service models. Receivers with storage capacity (i.e. PVRs, iDTVs with built-in HDDs) potentially enable content to be pushed and stored onto the receiver via the broadcast/IP connection. Additional content can be requested by the consumer via the pull mechanisms offered by the IP connection. Broadcasters are now able to deploy applications such as home shopping, voting, enhanced programme guides, pushVoD, CatchUp TV (streaming-based or storage-based), and targeted advertising to create new services and revenue streams.

Q.34 What is Video conferencing? What are the Uses for Videoconferencing? What are the different Components of Videoconferencing? Also highlight on Videoconference Options and disadvantages of video conferencing. Ans. Videoconferencing is a method of communicating between two or more locations where sound, vision and data signals are conveyed electronically to enable simultaneous interactive communication. Uses for Videoconferencing

Meetings: cost savings on travel, accommodation and staff time. Several sites can be linked together. Having a set time and duration for a meeting encourages punctuality and focused discussion. Data sharing: images from a PC, such as spreadsheets, PowerPoint illustrations etc. can be shared to enhance a presentation. Interviews: cost savings can allow more candidates to be interviewed. With data sharing, CVs can be viewed and discussed online. Teaching: access to remote expertise. For example, Scotland and Wales both use their Educational Video Networks extensively for teaching to remote rural areas where travelling to a lecture can be difficult. Remote diagnosis: in rural areas specialist medical help may not be on hand. By linking to a regional centre, cottage hospitals and GPs can receive help in diagnosing patients' disorders. Legal work: reduced intimidation of vulnerable court witnesses. Particularly sensitive cases involving children or rape can be made more acceptable by separating the victims physically from the court. Components of Videoconferencing Videoconferencing has three essential components: 1. The equipment at each site that captures the voices and pictures of the participants and converts them to a form that enables transmission over suitable networks. 2. The intervening network that carries the signals between sites. 3. The conference environment or room.

1. Videoconferencing Equipment A basic conference requires three components: a television camera to capture images and convert them into an electrical signal, a microphone to do likewise with the sound, and a CODEC (Coder/Decoder). The Coder accepts the vision and sound signals (video and audio) and processes them into a suitable format for transmission through the network to the remote site. To receive information the Decoder does the reverse: it accepts the digital signals from the remote site over the network and decodes or converts these into video and audio. Finally this video and audio are fed to a television monitor and loudspeaker to display the pictures and reproduce the sound from the remote site.

2. The Network Two network technologies are mainly employed: the Internet, using Internet Protocol (IP) the dial-up Integrated Services Digital Network (ISDN) over modified telephone lines. With IP transmission the results can be variable as the videoconference data has to compete with other computing data. ISDN guarantees connections at the selected quality, giving more reliable conferences, but as call charges are levied it is also more expensive than IP. 3. The Conference Environment The starting point for efficient conferences is an effective conferencing room. Normal rooms or offices will be unsuitable without modification. The human ear can adapt to ambient noise

from traffic, heating and so forth, but microphones may emphasise it to the point where communication is impossible. The human eye can also adapt to wide variations in scene brightness, for example sunlight streaming through a window. Cameras are not tolerant of high contrast scenes and may even white-out completely. Room acoustics, decoration and several other factors need to be tightly controlled for effective videoconferencing. See: http://www.ja.net/documents/services/video/vcrooms.pdf

Chairman Control: one site assumes the chair and other sites only receive the chairs sound and vision. The chair site receives sound/vision from the site currently speaking. Continual Presence: the picture is segmented to give a thumbnail image of each site in the conference, the sound being voice switched.

Possible Disadvantages Videoconferencing is a form of television and has similar guidelines. High contrast or heavily patterned clothes should be avoided. Movement should be minimal. Conferences may suffer from a delay on the sound (up to 0.5 second) and this can be unnerving. Voice switched conferences demand discipline as another site interrupting will switch the picture away from the speaker to the new contributor. Videoconference Options Point-to-Point: conferences are set up between two destinations so each site can see and hear the other simultaneously. Multipoint: conferences involves several destinations. This requires a Multipoint Control Unit (MCU), a switch that distributes audio and video to all participating venues. Options for multipoint conferences are: Voice Switched: not all sites see other sites simultaneously. Instead the image of the site speaking takes precedence and is seen by all the other sites. Q.35 Explain the Multimedia Multicast/Broadcast Services in detai. Ans 3GPP MBMS The basic idea Multimedia Broadcast and Multicast Services Seamless integration of broadcast/multicast transmission capabilities into 3G service and network infrastructure The technology may degrade the received images and sound. Body language can be lost if image movement is jerky. There can be a delay on the sound that requires time to get accustomed to. The atmosphere of a face-to-face meeting is lost. For meetings, some say that videoconferences are more effective if the participants at each site already know each other.

Serving large user groups simultaneously with content New MBMS (radio) bearers Uses IP Multicast Framework for services Usage of MBMS bearer is transparent to the use MBMS in 3GPP 3GPP Work Item for Release 6 (and Release 7) Release 6 supposed to be frozen by September 2004 MBMS Stage 2 work (Architecture and Procedures) is almost finished MBMS Stage 3 work (Protocols and Messages) has already started in some groups; It is expected that the stage 3 work for the Release 6 continues after the actual freeze date 3GPP Overview: Involved Groups: SA1-4; RAN1/2; GERAN2; CN1/3/4; Driving companies: Three, T-Mobile, TIM, Vodafone, Ericsson, Siemens, Nokia, Nortel, Samsung, Qualcomm, NEC, Alcatel, Lucent, Panasonic, LG,

MBMS User Service MBMS Download (Messaging) Push a multimedia message into the phone MBMS Streaming Continuous media stream transmission and immediate played-out Principles of Multicast Messaging/Download Users has joined to a certain Information Distribution services (Push Service) Two Phase delivery 1st Phase: Information Message is Pushed into the Phone using MBMS delivery schemes 2nd Phase: Post-Delivery Procedure like ptp Repair or Content Reception Reporting Issues of Multicast Messaging/ Download New ptm protocol required for object encapsulation SA4 bases work on FLUTE/ALC (IETF RMT protocols) Error free delivery of the object (content) Content protection via application layer FEC Principle of MBMS Streaming Different forms of content provision

Live-Feed from e.g. a Show Scheduled transmission of pre-recorded content (e.g. news broadcast every day at 8am) !User knows, when to connect to the server !Different Announcements/Advertisements possible Basic Operation User hooks-on and stays for a while Content is received and immediately played-out on the display Advanced feature: Mobile VCR to capture Live transmissions MBMS Streaming Codecs not defined yet Video codec candidates: H.264 H.263+ Audio codec candidates: AMR-WB+ Enhanced AAC+ MBMS Streaming No feedback channel No power control Target for MBMS Streaming Bearer: BLER 1% FEC under discussion Introduction With the growth of multimedia computing and the spread of the Internet, more and more users can display and manipulate images and would like to use those for an increasing number of applications. A problem is that most image databases are not indexed in useful ways, and many are not indexed at all. It is a formidable task to create an index of images in general and we address only a part of the problem, i.e. the creation of an index that allows retrieval of images similar to a given image presented as a query. We can try to find images that are of the same kind - for example, given an image of a persons face, the aim is to retrieve other facial images from the database. Ans. 3GPP MBMS Summary Seamless integration of broadcast/multicast transmission capabilities into 3G service and network infrastructure New function (Broadcast/Multicast service center BM-SC) Extensions in Core- and Radio Network for new MBMS context New multicast radio bearers Considerable resource savings in CN and RAN for multiuser content delivery services Q.24 How is Indexing and retrieval of Video Database take place in multimedia system what are the basics of content based image retrieval.

The concepts of databases management systems have been extended to deal with images. For example the QBIC (Query Based Image Content) system attempts to retrieve image and video using a variety of features, such as colour, texture or shape as keys.

4. Randomly pair up the window, with the constraint that each window have only one partner. The algorithm of computing key values for image indexing is given below : 1. Select a set of windows and pair them up.

Michael Shneier & Mottalabs approach This method takes advantage of JPEG coding both to decrease the amount of data that must be processed and to provide the basis for the index keys used for retrieval. 38.1.1 Construction of keys The algorithm for selecting random windows is given below: 1. Choose the number of bits, K for the index keys. The number of windows used will be 2K. 2. For each pair of windows, allocate a bit in the index key.

3. For each window, compute a vector of features. We take the already computed DCT coefficients in each 8 x 8 block of window as the features and compute the average of each DCT coefficients for all the blocks in the window, giving a vector of 64 feature values. 4. For each feature value and each window pair, compute an index key, giving a vector of index keys, one for each feature, as follows: Compare the value of the first window with that of the second. If the Difference in value is greater than a threshold, assign a 1 to the Corresponding bit; otherwise assign 0. 5. Store the vector of image keys as the index to the image.

2. Select the window coordinates to tile the image. 3. Determine the window size, as a function of the image size. For compatibility with JPEG 8 x 8 blocks, we clip the window to the smallest multiple of 8 less than this in each dimension.

Image based retrieval The algorithm for computing image similarity is as follows: 1. Construct a vector of keys for the index image using the same arrangement of windows and the same features that were used to construct the indices for the images in the database.

Sawhney and Ayer presented techniques for automatic decomposition of a video sequence into multiple motion models and their layers of supports, which together constitute a compact description of significant scene structure. This includes: a) Separation of the dominant background scene from moving objects.

2. Compare the key with the keys of all the images in the database, on a bit by bit basis. b) Representation of the scene and moving objects into multiple layers of motion and spatial support. Furthermore, the motion based decomposition of videos can be used to create compact views of numerous frames in a shot by video mosaicing. There are two major approaches to the problem of separating image sequences into multiple scene structures and objects based on motion. 4. Repeat the match computation for each of the features in the key vector, and sum the results for all the features One set solves the problem by letting multiple models simultaneously compete for the description of the individual motion measurements, and in the second set, multiple models are fleshed out sequentially by solving for a dominant model at each stage.

3. Compute the degree of match as the sum of all bit positions that are different.

5. The total number of differences is the measure of similarity. Video retrieval The simplest way to represent video for browsing and querying is through key in a short, typically the first, middle, last frame or a combination of these. Thereafter, QBIC like queries can be performed. Simultaneous multiple motion estimation Essential idea : Iteratively clustering motion models computed using pre-computed dense optical flow. It is main draw backs are:

a) In computing optical flow, we make soothes assumptions that can distort the structure of image motion.

where p is the 2-D vector of image coordinates and u(p;) is the displacement vector at p described using a parameter vector . 2- D global parametric model: a low dimensional describes the motion.

b) Clustering in the parameter space is generally sensitive to the number of clusters specified. Dominant Motion Estimation: Sequential application of dominant motion estimation methods have been proposed for extracting multiple motions and layers.

3-D global parametric model: a low dimensional global parameter + projective depth part.

Ayer et al combined intensity-based segmentation with the motion information. However, they noticed that even with the use of robust estimations, the sequential dominant motion approach may be confronted with the absence of dominant motion. Robust estimation of a motion model using direct methods : Given two images, their motion transformation is modeled as

In order to compute motion of varying magnitudes, the images are represented at multiple scales using Gaussian or Laplacian pyramids. In the M-estimation formulation, the unknown parameters are estimated by minimizing an objective function of the residual error. In particular, the following minimization problem is solved:

Video Mosaicing with dominant 2 D Motion Estimation In many real video sequences where the camera is panning and tracking an object, a panoramic view of the background can be created by mosaicing together numerous frames with warping transorms that are the result of automatic dominant motion computation. The 2-D motion estimation algorithm is applied between consecutive pairs of frames. Then a references frame is chosen and all the frames are warped into the coordinate system of the reference frame. This process creates a mosaiced frame whose size, in general, is bigger than the original images; parts of the scene not seen in the reference view occupy the extra space. Temporal filtering of various warped images in the mosaic frames coordinate system creates the mosaiced image. Q.36 What are the recent development in multimedia? Ans.

You might also like